mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
Merge remote-tracking branch 'origin/main' into garrytan/community-mode
This commit is contained in:
@@ -342,21 +342,49 @@ If `NEEDS_SETUP`:
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
## Step 0: Pre-flight cleanup
|
||||
|
||||
Before connecting, kill any stale browse servers and clean up lock files that
|
||||
may have persisted from a crash. This prevents "already connected" false
|
||||
positives and Chromium profile lock conflicts.
|
||||
|
||||
```bash
|
||||
# Kill any existing browse server
|
||||
if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
|
||||
_OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
|
||||
[ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
|
||||
sleep 1
|
||||
[ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
|
||||
rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
|
||||
fi
|
||||
# Clean Chromium profile locks (can persist after crashes)
|
||||
_PROFILE_DIR="$HOME/.gstack/chromium-profile"
|
||||
for _LF in SingletonLock SingletonSocket SingletonCookie; do
|
||||
rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
|
||||
done
|
||||
echo "Pre-flight cleanup done"
|
||||
```
|
||||
|
||||
## Step 1: Connect
|
||||
|
||||
```bash
|
||||
$B connect
|
||||
```
|
||||
|
||||
This launches your system Chrome via Playwright with:
|
||||
- A visible window (headed mode, not headless)
|
||||
- The gstack Chrome extension pre-loaded
|
||||
- A green shimmer line + "gstack" pill so you know which window is controlled
|
||||
This launches Playwright's bundled Chromium in headed mode with:
|
||||
- A visible window you can watch (not your regular Chrome — it stays untouched)
|
||||
- The gstack Chrome extension auto-loaded via `launchPersistentContext`
|
||||
- A golden shimmer line at the top of every page so you know which window is controlled
|
||||
- A sidebar agent process for chat commands
|
||||
|
||||
If Chrome is already running, the server restarts in headed mode with a fresh
|
||||
Chrome instance. Your regular Chrome stays untouched.
|
||||
The `connect` command auto-discovers the extension from the gstack install
|
||||
directory. It always uses port **34567** so the extension can auto-connect.
|
||||
|
||||
After connecting, print the output to the user.
|
||||
After connecting, print the full output to the user. Confirm you see
|
||||
`Mode: headed` in the output.
|
||||
|
||||
If the output shows an error or the mode is not `headed`, run `$B status` and
|
||||
share the output with the user before proceeding.
|
||||
|
||||
## Step 2: Verify
|
||||
|
||||
@@ -364,27 +392,41 @@ After connecting, print the output to the user.
|
||||
$B status
|
||||
```
|
||||
|
||||
Confirm the output shows `Mode: cdp`. Print the port number — the user may need
|
||||
it for the Side Panel.
|
||||
Confirm the output shows `Mode: headed`. Read the port from the state file:
|
||||
|
||||
```bash
|
||||
cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
|
||||
```
|
||||
|
||||
The port should be **34567**. If it's different, note it — the user may need it
|
||||
for the Side Panel.
|
||||
|
||||
Also find the extension path so you can help the user if they need to load it manually:
|
||||
|
||||
```bash
|
||||
_EXT_PATH=""
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
[ -n "$_ROOT" ] && [ -f "$_ROOT/.agents/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.agents/skills/gstack/extension"
|
||||
[ -z "$_EXT_PATH" ] && [ -f "$HOME/.agents/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.agents/skills/gstack/extension"
|
||||
echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
|
||||
```
|
||||
|
||||
## Step 3: Guide the user to the Side Panel
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> Chrome is launched with gstack control. You should see a green shimmer line at the
|
||||
> top of the Chrome window and a small "gstack" pill in the bottom-right corner.
|
||||
> Chrome is launched with gstack control. You should see Playwright's Chromium
|
||||
> (not your regular Chrome) with a golden shimmer line at the top of the page.
|
||||
>
|
||||
> The Side Panel extension is pre-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in Chrome's toolbar
|
||||
> 2. Click it → find **gstack browse** → click the **pin icon** to pin it
|
||||
> 3. Click the **gstack icon** in the toolbar
|
||||
> 4. Click **Open Side Panel**
|
||||
> The Side Panel extension should be auto-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
|
||||
> already show the gstack icon if the extension loaded successfully
|
||||
> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
|
||||
> 3. Click the pinned **gstack icon** in the toolbar
|
||||
> 4. The Side Panel should open on the right showing a live activity feed
|
||||
>
|
||||
> The Side Panel shows a live feed of every browse command in real time.
|
||||
>
|
||||
> **Port:** The browse server is on port {PORT} — the extension auto-detects it
|
||||
> if you're using the Playwright-controlled Chrome. If the badge stays gray, click
|
||||
> the gstack icon and enter port {PORT} manually.
|
||||
> **Port:** 34567 (auto-detected — the extension connects automatically in the
|
||||
> Playwright-controlled Chrome).
|
||||
|
||||
Options:
|
||||
- A) I can see the Side Panel — let's go!
|
||||
@@ -392,22 +434,34 @@ Options:
|
||||
- C) Something went wrong
|
||||
|
||||
If B: Tell the user:
|
||||
> The extension should be auto-loaded, but Chrome sometimes doesn't show it
|
||||
> immediately. Try:
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for "gstack browse" — it should be listed and enabled
|
||||
> 3. If not listed, click "Load unpacked" → navigate to the extension folder
|
||||
> (press Cmd+Shift+G in the file picker, paste this path):
|
||||
> `{EXTENSION_PATH}`
|
||||
>
|
||||
> Then pin it from the puzzle piece icon and open the Side Panel.
|
||||
|
||||
If C: Run `$B status` and show the output. Check if the server is healthy.
|
||||
> The extension is loaded into Playwright's Chromium at launch time, but
|
||||
> sometimes it doesn't appear immediately. Try these steps:
|
||||
>
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for **"gstack browse"** — it should be listed and enabled
|
||||
> 3. If it's there but not pinned, go back to any page, click the puzzle piece
|
||||
> icon, and pin it
|
||||
> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
|
||||
> - Press **Cmd+Shift+G** in the file picker dialog
|
||||
> - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
|
||||
> - Click **Select**
|
||||
>
|
||||
> After loading, pin it and click the icon to open the Side Panel.
|
||||
>
|
||||
> If the Side Panel badge stays gray (disconnected), click the gstack icon
|
||||
> and enter port **34567** manually.
|
||||
|
||||
If C:
|
||||
|
||||
1. Run `$B status` and show the output
|
||||
2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
|
||||
3. If the server IS healthy but the browser isn't visible, try `$B focus`
|
||||
4. If that fails, ask the user what they see (error message, blank screen, etc.)
|
||||
|
||||
## Step 4: Demo
|
||||
|
||||
After the user confirms the Side Panel is working, run a quick demo so they
|
||||
can see the activity feed in action:
|
||||
After the user confirms the Side Panel is working, run a quick demo:
|
||||
|
||||
```bash
|
||||
$B goto https://news.ycombinator.com
|
||||
@@ -420,7 +474,7 @@ $B snapshot -i
|
||||
```
|
||||
|
||||
Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
|
||||
commands appear in the activity feed. Every command Claude runs will show up here
|
||||
commands appear in the activity feed. Every command Claude runs shows up here
|
||||
in real time."
|
||||
|
||||
## Step 5: Sidebar chat
|
||||
@@ -428,8 +482,9 @@ in real time."
|
||||
After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
> The Side Panel also has a **chat tab**. Try typing a message like "take a
|
||||
> snapshot and describe this page." A child Claude instance will execute your
|
||||
> request in the browser — you'll see the commands appear in the activity feed.
|
||||
> snapshot and describe this page." A sidebar agent (a child Claude instance)
|
||||
> executes your request in the browser — you'll see the commands appear in
|
||||
> the activity feed as they happen.
|
||||
>
|
||||
> The sidebar agent can navigate pages, click buttons, fill forms, and read
|
||||
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
|
||||
@@ -439,17 +494,28 @@ After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
Tell the user:
|
||||
|
||||
> You're all set! Chrome is under Claude's control with the Side Panel showing
|
||||
> live activity and a chat sidebar for direct commands. Here's what you can do:
|
||||
> You're all set! Here's what you can do with the connected Chrome:
|
||||
>
|
||||
> - **Chat in the sidebar** — type natural language instructions and Claude
|
||||
> executes them in the browser
|
||||
> - **Run any browse command** — `$B goto`, `$B click`, `$B snapshot` — and
|
||||
> watch it happen in Chrome + the Side Panel
|
||||
> - **Use /qa or /design-review** — they'll run in the visible Chrome window
|
||||
> instead of headless. No cookie import needed.
|
||||
> - **`$B focus`** — bring Chrome to the foreground anytime
|
||||
> - **`$B disconnect`** — return to headless mode when done
|
||||
> **Watch Claude work in real time:**
|
||||
> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
|
||||
> every action happen in the visible Chrome window + Side Panel feed
|
||||
> - No cookie import needed — the Playwright browser shares its own session
|
||||
>
|
||||
> **Control the browser directly:**
|
||||
> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
|
||||
> agent executes it (e.g., "fill in the login form and submit")
|
||||
> - **Browse commands** — `$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
|
||||
> `$B snapshot -i` — all visible in Chrome + Side Panel
|
||||
>
|
||||
> **Window management:**
|
||||
> - `$B focus` — bring Chrome to the foreground anytime
|
||||
> - `$B disconnect` — close headed Chrome and return to headless mode
|
||||
>
|
||||
> **What skills look like in headed mode:**
|
||||
> - `/qa` runs its full test suite in the visible browser — you see every page
|
||||
> load, every click, every assertion
|
||||
> - `/design-review` takes screenshots in the real browser — same pixels you see
|
||||
> - `/benchmark` measures performance in the headed browser
|
||||
|
||||
Then proceed with whatever the user asked to do. If they didn't specify a task,
|
||||
ask what they'd like to test or browse.
|
||||
|
||||
@@ -1,5 +1,80 @@
|
||||
# Changelog
|
||||
|
||||
## [0.12.8.1] - 2026-03-27 — zsh Glob Compatibility
|
||||
|
||||
Skill scripts now work correctly in zsh. Previously, bash code blocks in skill templates used raw glob patterns like `.github/workflows/*.yaml` and `ls ~/.gstack/projects/$SLUG/*-design-*.md` that would throw "no matches found" errors in zsh when no files matched. Fixed 38 instances across 13 templates and 2 resolvers using two approaches: `find`-based alternatives for complex patterns, and `setopt +o nomatch` guards for simple `ls` commands.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`.github/workflows/` globs replaced with `find`.** `cat .github/workflows/*deploy*`, `for f in .github/workflows/*.yml`, and `ls .github/workflows/*.yaml` patterns in `/land-and-deploy`, `/setup-deploy`, `/cso`, and the deploy bootstrap resolver now use `find ... -name` instead of raw globs.
|
||||
- **`~/.gstack/` and `~/.claude/` globs guarded with `setopt`.** Design doc lookups, eval result listings, test plan discovery, and retro history checks across 10 skills now prepend `setopt +o nomatch 2>/dev/null || true` (no-op in bash, disables NOMATCH in zsh).
|
||||
- **Test framework detection globs guarded.** `ls jest.config.* vitest.config.*` in the testing resolver now has a setopt guard.
|
||||
|
||||
## [0.12.8.0] - 2026-03-27 — Codex No Longer Reviews the Wrong Project
|
||||
|
||||
When you run gstack in Conductor with multiple workspaces open, Codex could silently review the wrong project. The `codex exec -C` flag resolved the repo root inline via `$(git rev-parse --show-toplevel)`, which evaluates in whatever cwd the background shell inherits. In multi-workspace environments, that cwd might be a different project entirely.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Codex exec resolves repo root eagerly.** All 12 `codex exec` commands across `/codex`, `/autoplan`, and 4 resolver functions now resolve `_REPO_ROOT` at the top of each bash block and reference the stored value in `-C`. No more inline evaluation that races with other workspaces.
|
||||
- **`codex review` also gets cwd protection.** `codex review` doesn't support `-C`, so it now gets `cd "$_REPO_ROOT"` before invocation. Same class of bug, different command.
|
||||
- **Silent fallback replaced with hard fail.** The `|| pwd` fallback silently used whatever random cwd was available. Now it errors out with a clear message if not in a git repo.
|
||||
|
||||
### Removed
|
||||
|
||||
- **Dead resolver copies in gen-skill-docs.ts.** Six functions that were moved to `scripts/resolvers/` months ago but never deleted. They had already diverged from the live versions and contained the old vulnerable pattern.
|
||||
|
||||
### Added
|
||||
|
||||
- **Regression test** that scans all `.tmpl`, resolver `.ts`, and generated `SKILL.md` files for codex commands using inline `$(git rev-parse --show-toplevel)`. Prevents reintroduction.
|
||||
|
||||
## [0.12.7.0] - 2026-03-27 — Community PRs + Security Hardening
|
||||
|
||||
Seven community contributions merged, reviewed, and tested. Plus security hardening for telemetry and review logging, and E2E test stability fixes.
|
||||
|
||||
### Added
|
||||
|
||||
- **Dotfile filtering in skill discovery.** Hidden directories (`.git`, `.vscode`, etc.) are no longer picked up as skill templates.
|
||||
- **JSON validation gate in review-log.** Malformed input is rejected instead of appended to the JSONL file.
|
||||
- **Telemetry input sanitization.** All string fields are stripped of quotes, backslashes, and control characters before being written to JSONL.
|
||||
- **Host-specific co-author trailers.** `/ship` and `/document-release` now use the correct co-author line for Codex vs Claude.
|
||||
- **10 new security tests** covering telemetry injection, review-log validation, and dotfile filtering.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **File paths starting with `./` no longer treated as CSS selectors.** `$B screenshot ./path/to/file.png` now works instead of trying to find a CSS element.
|
||||
- **Build chain resilience.** `gen:skill-docs` failure no longer blocks binary compilation.
|
||||
- **Update checker fall-through.** After upgrading, the checker now also checks for newer remote versions instead of stopping.
|
||||
- **Flaky E2E tests stabilized.** `browse-basic`, `ship-base-branch`, and `review-dashboard-via` tests now pass reliably by extracting only relevant SKILL.md sections instead of copying full 1900-line files into test fixtures.
|
||||
- **Removed unreliable `journey-think-bigger` routing test.** Never passed reliably because the routing signal was too ambiguous. 10 other journey tests cover routing with clear signals.
|
||||
|
||||
### For contributors
|
||||
|
||||
- New CLAUDE.md rule: never copy full SKILL.md files into E2E test fixtures. Extract the relevant section only.
|
||||
|
||||
## [0.12.6.0] - 2026-03-27 — Sidebar Knows What Page You're On
|
||||
|
||||
The Chrome sidebar agent used to navigate to the wrong page when you asked it to do something. If you'd manually browsed to a site, the sidebar would ignore that and go to whatever Playwright last saw (often Hacker News from the demo). Now it works.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Sidebar uses the real tab URL.** The Chrome extension now captures the actual page URL via `chrome.tabs.query()` and sends it to the server. Previously the sidebar agent used Playwright's stale `page.url()`, which didn't update when you navigated manually in headed mode.
|
||||
- **URL sanitization.** The extension-provided URL is validated (http/https only, control characters stripped, 2048 char limit) before being used in the Claude system prompt. Prevents prompt injection via crafted URLs.
|
||||
- **Stale sidebar agents killed on reconnect.** Each `/connect-chrome` now kills leftover sidebar-agent processes before starting a new one. Old agents had stale auth tokens and would silently fail, causing the sidebar to freeze.
|
||||
|
||||
### Added
|
||||
|
||||
- **Pre-flight cleanup for `/connect-chrome`.** Kills stale browse servers and cleans Chromium profile locks before connecting. Prevents "already connected" false positives after crashes.
|
||||
- **Sidebar agent test suite (36 tests).** Four layers: unit tests for URL sanitization, integration tests for server HTTP endpoints, mock-Claude round-trip tests, and E2E tests with real Claude. All free except layer 4.
|
||||
|
||||
## [0.12.5.1] - 2026-03-27 — Eng Review Now Tells You What to Parallelize
|
||||
|
||||
`/plan-eng-review` automatically analyzes your plan for parallel execution opportunities. When your plan has independent workstreams, the review outputs a dependency table, parallel lanes, and execution order so you know exactly which tasks to split into separate git worktrees.
|
||||
|
||||
### Added
|
||||
|
||||
- **Worktree parallelization strategy** in `/plan-eng-review` required outputs. Extracts a structured table of plan steps with module-level dependencies, computes parallel lanes, and flags merge conflict risks. Skips automatically for single-module or single-track plans.
|
||||
|
||||
## [0.12.5.0] - 2026-03-26 — Fix Codex Hangs: 30-Minute Waits Are Gone
|
||||
|
||||
Three bugs in `/codex` caused 30+ minute hangs with zero output during plan reviews and adversarial checks. All three are fixed.
|
||||
|
||||
@@ -298,6 +298,30 @@ them. Report progress at each check (which tests passed, which are running, any
|
||||
failures so far). The user wants to see the run complete, not a promise that
|
||||
you'll check later.
|
||||
|
||||
## E2E test fixtures: extract, don't copy
|
||||
|
||||
**NEVER copy a full SKILL.md file into an E2E test fixture.** SKILL.md files are
|
||||
1500-2000 lines. When `claude -p` reads a file that large, context bloat causes
|
||||
timeouts, flaky turn limits, and tests that take 5-10x longer than necessary.
|
||||
|
||||
Instead, extract only the section the test actually needs:
|
||||
|
||||
```typescript
|
||||
// BAD — agent reads 1900 lines, burns tokens on irrelevant sections
|
||||
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
|
||||
|
||||
// GOOD — agent reads ~60 lines, finishes in 38s instead of timing out
|
||||
const full = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const start = full.indexOf('## Review Readiness Dashboard');
|
||||
const end = full.indexOf('\n---\n', start);
|
||||
fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), full.slice(start, end > start ? end : undefined));
|
||||
```
|
||||
|
||||
Also when running targeted E2E tests to debug failures:
|
||||
- Run in **foreground** (`bun test ...`), not background with `&` and `tee`
|
||||
- Never `pkill` running eval processes and restart — you lose results and waste money
|
||||
- One clean run beats three killed-and-restarted runs
|
||||
|
||||
## Deploying to the active skill
|
||||
|
||||
The active skill lives at `~/.claude/skills/gstack/`. After making changes:
|
||||
|
||||
@@ -185,6 +185,18 @@ Sidebar agent writes structured messages to `.context/sidebar-inbox/`. Workspace
|
||||
**Priority:** P3
|
||||
**Depends on:** Headed mode (shipped)
|
||||
|
||||
### Sidebar agent needs Write tool + better error visibility
|
||||
|
||||
**What:** Two issues with the sidebar agent (`sidebar-agent.ts`): (1) `--allowedTools` is hardcoded to `Bash,Read,Glob,Grep`, missing `Write`. Claude can't create files (like CSVs) when asked. (2) When Claude errors or returns empty, the sidebar UI shows nothing, just a green dot. No error message, no "I tried but failed", nothing.
|
||||
|
||||
**Why:** Users ask "write this to a CSV" and the sidebar silently can't. Then they think it's broken. The UI needs to surface errors visibly, and Claude needs the tools to actually do what's asked.
|
||||
|
||||
**Context:** `sidebar-agent.ts:163` hardcodes `--allowedTools`. The event relay (`handleStreamEvent`) handles `agent_done` and `agent_error` but the extension's sidepanel.js may not be rendering error states. The sidebar should show "Error: ..." or "Claude finished but produced no output" instead of staying on the green dot forever.
|
||||
|
||||
**Effort:** S (human: ~2h / CC: ~10min)
|
||||
**Priority:** P1
|
||||
**Depends on:** None
|
||||
|
||||
### Chrome Web Store publishing
|
||||
|
||||
**What:** Publish the gstack browse Chrome extension to Chrome Web Store for easier install.
|
||||
|
||||
+16
-6
@@ -408,6 +408,7 @@ If the Read fails (file not found), say:
|
||||
|
||||
After /office-hours completes, re-run the design doc check:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -587,13 +588,16 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
Run them simultaneously (Agent tool for subagent, Bash for Codex).
|
||||
|
||||
**Codex CEO voice** (via Bash):
|
||||
Command: `codex exec "You are a CEO/founder advisor reviewing a development plan.
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "You are a CEO/founder advisor reviewing a development plan.
|
||||
Challenge the strategic foundations: Are the premises valid or assumed? Is this the
|
||||
right problem to solve, or is there a reframing that would be 10x more impactful?
|
||||
What alternatives were dismissed too quickly? What competitive or market risks are
|
||||
unaddressed? What scope decisions will look foolish in 6 months? Be adversarial.
|
||||
No compliments. Just the strategic blind spots.
|
||||
File: <plan_path>" -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
File: <plan_path>" -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude CEO subagent** (via Agent tool):
|
||||
@@ -692,7 +696,9 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
- Dual voices: always run BOTH Claude subagent AND Codex if available (P6).
|
||||
|
||||
**Codex design voice** (via Bash):
|
||||
Command: `codex exec "Read the plan file at <plan_path>. Evaluate this plan's
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Read the plan file at <plan_path>. Evaluate this plan's
|
||||
UI/UX design decisions.
|
||||
|
||||
Also consider these findings from the CEO review phase:
|
||||
@@ -704,7 +710,8 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
accessibility requirements (keyboard nav, contrast, touch targets) specified or
|
||||
aspirational? Does the plan describe specific UI decisions or generic patterns?
|
||||
What design decisions will haunt the implementer if left ambiguous?
|
||||
Be opinionated. No hedging." -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
Be opinionated. No hedging." -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude design subagent** (via Agent tool):
|
||||
@@ -762,14 +769,17 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
- Dual voices: always run BOTH Claude subagent AND Codex if available (P6).
|
||||
|
||||
**Codex eng voice** (via Bash):
|
||||
Command: `codex exec "Review this plan for architectural issues, missing edge cases,
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review this plan for architectural issues, missing edge cases,
|
||||
and hidden complexity. Be adversarial.
|
||||
|
||||
Also consider these findings from prior review phases:
|
||||
CEO: <insert CEO consensus table summary — key concerns, DISAGREEs>
|
||||
Design: <insert Design consensus table summary, or 'skipped, no UI scope'>
|
||||
|
||||
File: <plan_path>" -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
File: <plan_path>" -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude eng subagent** (via Agent tool):
|
||||
|
||||
+15
-6
@@ -198,13 +198,16 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
Run them simultaneously (Agent tool for subagent, Bash for Codex).
|
||||
|
||||
**Codex CEO voice** (via Bash):
|
||||
Command: `codex exec "You are a CEO/founder advisor reviewing a development plan.
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "You are a CEO/founder advisor reviewing a development plan.
|
||||
Challenge the strategic foundations: Are the premises valid or assumed? Is this the
|
||||
right problem to solve, or is there a reframing that would be 10x more impactful?
|
||||
What alternatives were dismissed too quickly? What competitive or market risks are
|
||||
unaddressed? What scope decisions will look foolish in 6 months? Be adversarial.
|
||||
No compliments. Just the strategic blind spots.
|
||||
File: <plan_path>" -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
File: <plan_path>" -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude CEO subagent** (via Agent tool):
|
||||
@@ -303,7 +306,9 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
- Dual voices: always run BOTH Claude subagent AND Codex if available (P6).
|
||||
|
||||
**Codex design voice** (via Bash):
|
||||
Command: `codex exec "Read the plan file at <plan_path>. Evaluate this plan's
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Read the plan file at <plan_path>. Evaluate this plan's
|
||||
UI/UX design decisions.
|
||||
|
||||
Also consider these findings from the CEO review phase:
|
||||
@@ -315,7 +320,8 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
accessibility requirements (keyboard nav, contrast, touch targets) specified or
|
||||
aspirational? Does the plan describe specific UI decisions or generic patterns?
|
||||
What design decisions will haunt the implementer if left ambiguous?
|
||||
Be opinionated. No hedging." -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
Be opinionated. No hedging." -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude design subagent** (via Agent tool):
|
||||
@@ -373,14 +379,17 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
|
||||
- Dual voices: always run BOTH Claude subagent AND Codex if available (P6).
|
||||
|
||||
**Codex eng voice** (via Bash):
|
||||
Command: `codex exec "Review this plan for architectural issues, missing edge cases,
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review this plan for architectural issues, missing edge cases,
|
||||
and hidden complexity. Be adversarial.
|
||||
|
||||
Also consider these findings from prior review phases:
|
||||
CEO: <insert CEO consensus table summary — key concerns, DISAGREEs>
|
||||
Design: <insert Design consensus table summary, or 'skipped, no UI scope'>
|
||||
|
||||
File: <plan_path>" -C "$(git rev-parse --show-toplevel)" -s read-only --enable web_search_cached`
|
||||
File: <plan_path>" -C "$_REPO_ROOT" -s read-only --enable web_search_cached
|
||||
```
|
||||
Timeout: 10 minutes
|
||||
|
||||
**Claude eng subagent** (via Agent tool):
|
||||
|
||||
+10
-1
@@ -6,4 +6,13 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null)"
|
||||
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
|
||||
mkdir -p "$GSTACK_HOME/projects/$SLUG"
|
||||
echo "$1" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
|
||||
|
||||
# Validate: input must be parseable JSON (reject malformed or injection attempts)
|
||||
INPUT="$1"
|
||||
if ! printf '%s' "$INPUT" | bun -e "JSON.parse(await Bun.stdin.text())" 2>/dev/null; then
|
||||
# Not valid JSON — refuse to append
|
||||
echo "gstack-review-log: invalid JSON, skipping" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "$INPUT" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
|
||||
|
||||
@@ -149,15 +149,23 @@ fi
|
||||
# ─── Construct and append JSON ───────────────────────────────
|
||||
mkdir -p "$ANALYTICS_DIR"
|
||||
|
||||
# Escape null fields
|
||||
# Sanitize string fields for JSON safety (strip quotes, backslashes, control chars)
|
||||
json_safe() { printf '%s' "$1" | tr -d '"\\\n\r\t' | head -c 200; }
|
||||
SKILL="$(json_safe "$SKILL")"
|
||||
OUTCOME="$(json_safe "$OUTCOME")"
|
||||
SESSION_ID="$(json_safe "$SESSION_ID")"
|
||||
SOURCE="$(json_safe "$SOURCE")"
|
||||
EVENT_TYPE="$(json_safe "$EVENT_TYPE")"
|
||||
|
||||
# Escape null fields — sanitize ERROR_CLASS and FAILED_STEP via json_safe()
|
||||
ERR_FIELD="null"
|
||||
[ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$ERROR_CLASS\""
|
||||
[ -n "$ERROR_CLASS" ] && ERR_FIELD="\"$(json_safe "$ERROR_CLASS")\""
|
||||
|
||||
ERR_MSG_FIELD="null"
|
||||
[ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(echo "$ERROR_MESSAGE" | head -c 200 | sed 's/"/\\"/g')\""
|
||||
[ -n "$ERROR_MESSAGE" ] && ERR_MSG_FIELD="\"$(printf '%s' "$ERROR_MESSAGE" | head -c 200 | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e 's/ /\\t/g' | tr '\n\r' ' ')\""
|
||||
|
||||
STEP_FIELD="null"
|
||||
[ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(echo "$FAILED_STEP" | head -c 30)\""
|
||||
[ -n "$FAILED_STEP" ] && STEP_FIELD="\"$(json_safe "$FAILED_STEP")\""
|
||||
|
||||
# Cap unreasonable durations
|
||||
if [ -n "$DURATION" ] && [ "$DURATION" -gt 86400 ] 2>/dev/null; then
|
||||
|
||||
@@ -113,12 +113,11 @@ if [ -f "$MARKER_FILE" ]; then
|
||||
OLD="$(cat "$MARKER_FILE" 2>/dev/null | tr -d '[:space:]')"
|
||||
rm -f "$MARKER_FILE"
|
||||
rm -f "$SNOOZE_FILE"
|
||||
mkdir -p "$STATE_DIR"
|
||||
echo "UP_TO_DATE $LOCAL" > "$CACHE_FILE"
|
||||
if [ -n "$OLD" ]; then
|
||||
echo "JUST_UPGRADED $OLD $LOCAL"
|
||||
fi
|
||||
exit 0
|
||||
# Don't exit — fall through to remote check in case
|
||||
# more updates landed since the upgrade
|
||||
fi
|
||||
|
||||
# ─── Step 3: Check cache freshness ──────────────────────────
|
||||
|
||||
+43
-3
@@ -511,8 +511,27 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
||||
}
|
||||
}
|
||||
|
||||
// Clean up Chromium profile locks (can persist after crashes)
|
||||
// Kill orphaned Chromium processes that may still hold the profile lock.
|
||||
// The server PID is the Bun process; Chromium is a child that can outlive it
|
||||
// if the server is killed abruptly (SIGKILL, crash, manual rm of state file).
|
||||
const profileDir = path.join(process.env.HOME || '/tmp', '.gstack', 'chromium-profile');
|
||||
try {
|
||||
const singletonLock = path.join(profileDir, 'SingletonLock');
|
||||
const lockTarget = fs.readlinkSync(singletonLock); // e.g. "hostname-12345"
|
||||
const orphanPid = parseInt(lockTarget.split('-').pop() || '', 10);
|
||||
if (orphanPid && isProcessAlive(orphanPid)) {
|
||||
try { process.kill(orphanPid, 'SIGTERM'); } catch {}
|
||||
await new Promise(resolve => setTimeout(resolve, 1000));
|
||||
if (isProcessAlive(orphanPid)) {
|
||||
try { process.kill(orphanPid, 'SIGKILL'); } catch {}
|
||||
await new Promise(resolve => setTimeout(resolve, 500));
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// No lock symlink or not readable — nothing to kill
|
||||
}
|
||||
|
||||
// Clean up Chromium profile locks (can persist after crashes)
|
||||
for (const lockFile of ['SingletonLock', 'SingletonSocket', 'SingletonCookie']) {
|
||||
try { fs.unlinkSync(path.join(profileDir, lockFile)); } catch {}
|
||||
}
|
||||
@@ -545,17 +564,38 @@ Refs: After 'snapshot', use @e1, @e2... as selectors:
|
||||
console.log(`Connected to real Chrome\n${status}`);
|
||||
|
||||
// Auto-start sidebar agent
|
||||
const agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
|
||||
// __dirname is inside $bunfs in compiled binaries — resolve from execPath instead
|
||||
let agentScript = path.resolve(__dirname, 'sidebar-agent.ts');
|
||||
if (!fs.existsSync(agentScript)) {
|
||||
agentScript = path.resolve(path.dirname(process.execPath), '..', 'src', 'sidebar-agent.ts');
|
||||
}
|
||||
try {
|
||||
if (!fs.existsSync(agentScript)) {
|
||||
throw new Error(`sidebar-agent.ts not found at ${agentScript}`);
|
||||
}
|
||||
// Clear old agent queue
|
||||
const agentQueue = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
try { fs.writeFileSync(agentQueue, ''); } catch {}
|
||||
|
||||
// Resolve browse binary path the same way — execPath-relative
|
||||
let browseBin = path.resolve(__dirname, '..', 'dist', 'browse');
|
||||
if (!fs.existsSync(browseBin)) {
|
||||
browseBin = process.execPath; // the compiled binary itself
|
||||
}
|
||||
|
||||
// Kill any existing sidebar-agent processes before starting a new one.
|
||||
// Old agents have stale auth tokens and will silently fail to relay events,
|
||||
// causing the server to mark the agent as "hung".
|
||||
try {
|
||||
const { spawnSync } = require('child_process');
|
||||
spawnSync('pkill', ['-f', 'sidebar-agent\\.ts'], { stdio: 'ignore', timeout: 3000 });
|
||||
} catch {}
|
||||
|
||||
const agentProc = Bun.spawn(['bun', 'run', agentScript], {
|
||||
cwd: config.projectDir,
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_BIN: path.resolve(__dirname, '..', 'dist', 'browse'),
|
||||
BROWSE_BIN: browseBin,
|
||||
BROWSE_STATE_FILE: config.stateFile,
|
||||
BROWSE_SERVER_PORT: String(newState.port),
|
||||
},
|
||||
|
||||
@@ -137,7 +137,11 @@ export async function handleMetaCommand(
|
||||
|
||||
// Separate target (selector/@ref) from output path
|
||||
for (const arg of remaining) {
|
||||
if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
|
||||
// File paths containing / and ending with an image/pdf extension are never CSS selectors
|
||||
const isFilePath = arg.includes('/') && /\.(png|jpe?g|webp|pdf)$/i.test(arg);
|
||||
if (isFilePath) {
|
||||
outputPath = arg;
|
||||
} else if (arg.startsWith('@e') || arg.startsWith('@c') || arg.startsWith('.') || arg.startsWith('#') || arg.includes('[')) {
|
||||
targetSelector = arg;
|
||||
} else {
|
||||
outputPath = arg;
|
||||
|
||||
+35
-16
@@ -18,6 +18,7 @@ import { handleReadCommand } from './read-commands';
|
||||
import { handleWriteCommand } from './write-commands';
|
||||
import { handleMetaCommand } from './meta-commands';
|
||||
import { handleCookiePickerRoute } from './cookie-picker-routes';
|
||||
import { sanitizeExtensionUrl } from './sidebar-utils';
|
||||
import { COMMAND_DESCRIPTIONS } from './commands';
|
||||
import { handleSnapshot, SNAPSHOT_FLAGS } from './snapshot';
|
||||
import { resolveConfig, ensureStateDir, readVersionHash } from './config';
|
||||
@@ -123,7 +124,7 @@ let sidebarSession: SidebarSession | null = null;
|
||||
let agentProcess: ChildProcess | null = null;
|
||||
let agentStatus: 'idle' | 'processing' | 'hung' = 'idle';
|
||||
let agentStartTime: number | null = null;
|
||||
let messageQueue: Array<{message: string, ts: string}> = [];
|
||||
let messageQueue: Array<{message: string, ts: string, extensionUrl?: string | null}> = [];
|
||||
let currentMessage: string | null = null;
|
||||
let chatBuffer: ChatEntry[] = [];
|
||||
let chatNextId = 0;
|
||||
@@ -371,18 +372,27 @@ function processAgentEvent(event: any): void {
|
||||
}
|
||||
}
|
||||
|
||||
function spawnClaude(userMessage: string): void {
|
||||
function spawnClaude(userMessage: string, extensionUrl?: string | null): void {
|
||||
agentStatus = 'processing';
|
||||
agentStartTime = Date.now();
|
||||
currentMessage = userMessage;
|
||||
|
||||
const pageUrl = browserManager.getCurrentUrl() || 'about:blank';
|
||||
// Prefer the URL from the Chrome extension (what the user actually sees)
|
||||
// over Playwright's page.url() which can be stale in headed mode.
|
||||
const sanitizedExtUrl = sanitizeExtensionUrl(extensionUrl);
|
||||
const playwrightUrl = browserManager.getCurrentUrl() || 'about:blank';
|
||||
const pageUrl = sanitizedExtUrl || playwrightUrl;
|
||||
const B = BROWSE_BIN;
|
||||
const systemPrompt = [
|
||||
'You are a browser assistant running in a Chrome sidebar.',
|
||||
`Current page: ${pageUrl}`,
|
||||
`The user is currently viewing: ${pageUrl}`,
|
||||
`Browse binary: ${B}`,
|
||||
'',
|
||||
'IMPORTANT: You are controlling a SHARED browser. The user may have navigated',
|
||||
'manually. Always run `' + B + ' url` first to check the actual current URL.',
|
||||
'If it differs from above, the user navigated — work with the ACTUAL page.',
|
||||
'Do NOT navigate away from the user\'s current page unless they ask you to.',
|
||||
'',
|
||||
'Commands (run via bash):',
|
||||
` ${B} goto <url> ${B} click <@ref> ${B} fill <@ref> <text>`,
|
||||
` ${B} snapshot -i ${B} text ${B} screenshot`,
|
||||
@@ -404,8 +414,8 @@ function spawnClaude(userMessage: string): void {
|
||||
// fails with ENOENT on everything, including /bin/bash). Instead,
|
||||
// write the command to a queue file that the sidebar-agent process
|
||||
// (running as non-compiled bun) picks up and spawns claude.
|
||||
const gstackDir = path.join(process.env.HOME || '/tmp', '.gstack');
|
||||
const agentQueue = path.join(gstackDir, 'sidebar-agent-queue.jsonl');
|
||||
const agentQueue = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
const gstackDir = path.dirname(agentQueue);
|
||||
const entry = JSON.stringify({
|
||||
ts: new Date().toISOString(),
|
||||
message: userMessage,
|
||||
@@ -414,6 +424,7 @@ function spawnClaude(userMessage: string): void {
|
||||
stateFile: config.stateFile,
|
||||
cwd: (sidebarSession as any)?.worktreePath || process.cwd(),
|
||||
sessionId: sidebarSession?.claudeSessionId || null,
|
||||
pageUrl: pageUrl,
|
||||
});
|
||||
try {
|
||||
fs.mkdirSync(gstackDir, { recursive: true });
|
||||
@@ -781,12 +792,16 @@ async function start() {
|
||||
const port = await findPort();
|
||||
|
||||
// Launch browser (headless or headed with extension)
|
||||
const headed = process.env.BROWSE_HEADED === '1';
|
||||
if (headed) {
|
||||
await browserManager.launchHeaded();
|
||||
console.log(`[browse] Launched headed Chromium with extension`);
|
||||
} else {
|
||||
await browserManager.launch();
|
||||
// BROWSE_HEADLESS_SKIP=1 skips browser launch entirely (for HTTP-only testing)
|
||||
const skipBrowser = process.env.BROWSE_HEADLESS_SKIP === '1';
|
||||
if (!skipBrowser) {
|
||||
const headed = process.env.BROWSE_HEADED === '1';
|
||||
if (headed) {
|
||||
await browserManager.launchHeaded();
|
||||
console.log(`[browse] Launched headed Chromium with extension`);
|
||||
} else {
|
||||
await browserManager.launch();
|
||||
}
|
||||
}
|
||||
|
||||
const startTime = Date.now();
|
||||
@@ -935,17 +950,21 @@ async function start() {
|
||||
if (!msg) {
|
||||
return new Response(JSON.stringify({ error: 'Empty message' }), { status: 400, headers: { 'Content-Type': 'application/json' } });
|
||||
}
|
||||
// The Chrome extension sends the active tab's URL — prefer it over
|
||||
// Playwright's page.url() which can be stale in headed mode when
|
||||
// the user navigates manually.
|
||||
const extensionUrl = body.activeTabUrl || null;
|
||||
const ts = new Date().toISOString();
|
||||
addChatEntry({ ts, role: 'user', message: msg });
|
||||
if (sidebarSession) { sidebarSession.lastActiveAt = ts; saveSession(); }
|
||||
|
||||
if (agentStatus === 'idle') {
|
||||
spawnClaude(msg);
|
||||
spawnClaude(msg, extensionUrl);
|
||||
return new Response(JSON.stringify({ ok: true, processing: true }), {
|
||||
status: 200, headers: { 'Content-Type': 'application/json' },
|
||||
});
|
||||
} else if (messageQueue.length < MAX_QUEUE) {
|
||||
messageQueue.push({ message: msg, ts });
|
||||
messageQueue.push({ message: msg, ts, extensionUrl });
|
||||
return new Response(JSON.stringify({ ok: true, queued: true, position: messageQueue.length }), {
|
||||
status: 200, headers: { 'Content-Type': 'application/json' },
|
||||
});
|
||||
@@ -979,7 +998,7 @@ async function start() {
|
||||
// Process next in queue
|
||||
if (messageQueue.length > 0) {
|
||||
const next = messageQueue.shift()!;
|
||||
spawnClaude(next.message);
|
||||
spawnClaude(next.message, next.extensionUrl);
|
||||
}
|
||||
return new Response(JSON.stringify({ ok: true }), { status: 200, headers: { 'Content-Type': 'application/json' } });
|
||||
}
|
||||
@@ -1065,7 +1084,7 @@ async function start() {
|
||||
// Process next queued message
|
||||
if (messageQueue.length > 0) {
|
||||
const next = messageQueue.shift()!;
|
||||
spawnClaude(next.message);
|
||||
spawnClaude(next.message, next.extensionUrl);
|
||||
} else {
|
||||
agentStatus = 'idle';
|
||||
}
|
||||
|
||||
@@ -13,7 +13,7 @@ import { spawn } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
|
||||
const QUEUE = path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
const QUEUE = process.env.SIDEBAR_QUEUE_PATH || path.join(process.env.HOME || '/tmp', '.gstack', 'sidebar-agent-queue.jsonl');
|
||||
const SERVER_PORT = parseInt(process.env.BROWSE_SERVER_PORT || '34567', 10);
|
||||
const SERVER_URL = `http://127.0.0.1:${SERVER_PORT}`;
|
||||
const POLL_MS = 500; // Fast polling — server already did the user-facing response
|
||||
@@ -205,14 +205,15 @@ async function askClaude(queueEntry: any): Promise<void> {
|
||||
});
|
||||
});
|
||||
|
||||
// Timeout after 300 seconds (5 min — multi-page tasks need time)
|
||||
// Timeout (default 300s / 5 min — multi-page tasks need time)
|
||||
const timeoutMs = parseInt(process.env.SIDEBAR_AGENT_TIMEOUT || '300000', 10);
|
||||
setTimeout(() => {
|
||||
try { proc.kill(); } catch {}
|
||||
sendEvent({ type: 'agent_error', error: 'Timed out after 300s' }).then(() => {
|
||||
sendEvent({ type: 'agent_error', error: `Timed out after ${timeoutMs / 1000}s` }).then(() => {
|
||||
isProcessing = false;
|
||||
resolve();
|
||||
});
|
||||
}, 300000);
|
||||
}, timeoutMs);
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,21 @@
|
||||
/**
|
||||
* Shared sidebar utilities — extracted for testability.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Sanitize a URL from the Chrome extension before embedding in a prompt.
|
||||
* Only accepts http/https, strips control characters, truncates to 2048 chars.
|
||||
* Returns null if the URL is invalid or uses a non-http scheme.
|
||||
*/
|
||||
export function sanitizeExtensionUrl(url: string | null | undefined): string | null {
|
||||
if (!url) return null;
|
||||
try {
|
||||
const u = new URL(url);
|
||||
if (u.protocol === 'http:' || u.protocol === 'https:') {
|
||||
return u.href.replace(/[\x00-\x1f\x7f]/g, '').slice(0, 2048);
|
||||
}
|
||||
return null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
@@ -543,6 +543,17 @@ describe('Visual', () => {
|
||||
}
|
||||
});
|
||||
|
||||
test('screenshot treats relative dot-slash path as file path, not CSS selector', async () => {
|
||||
await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
|
||||
// ./path/to/file.png must be treated as output path, not a CSS class selector (#495)
|
||||
const relPath = './browse-test-dotpath.png';
|
||||
const absPath = path.resolve(relPath);
|
||||
const result = await handleMetaCommand('screenshot', [relPath], bm, async () => {});
|
||||
expect(result).toContain('Screenshot saved');
|
||||
expect(fs.existsSync(absPath)).toBe(true);
|
||||
fs.unlinkSync(absPath);
|
||||
});
|
||||
|
||||
test('screenshot with nonexistent selector throws timeout', async () => {
|
||||
await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
|
||||
try {
|
||||
|
||||
@@ -92,6 +92,35 @@ describe('gstack-update-check', () => {
|
||||
expect(cache).toContain('UP_TO_DATE');
|
||||
});
|
||||
|
||||
// ─── Path C2: Just-upgraded marker + newer remote ──────────
|
||||
test('just-upgraded marker does not mask newer remote version', () => {
|
||||
writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
|
||||
writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
|
||||
writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.5.0\n');
|
||||
|
||||
const { exitCode, stdout } = run();
|
||||
expect(exitCode).toBe(0);
|
||||
// Should output both the just-upgraded notice AND the new upgrade
|
||||
expect(stdout).toContain('JUST_UPGRADED 0.3.3 0.4.0');
|
||||
expect(stdout).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
|
||||
// Cache should reflect the upgrade available, not UP_TO_DATE
|
||||
const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
|
||||
expect(cache).toContain('UPGRADE_AVAILABLE 0.4.0 0.5.0');
|
||||
});
|
||||
|
||||
// ─── Path C3: Just-upgraded marker + remote matches local ──
|
||||
test('just-upgraded with no further updates writes UP_TO_DATE cache', () => {
|
||||
writeFileSync(join(gstackDir, 'VERSION'), '0.4.0\n');
|
||||
writeFileSync(join(stateDir, 'just-upgraded-from'), '0.3.3\n');
|
||||
writeFileSync(join(gstackDir, 'REMOTE_VERSION'), '0.4.0\n');
|
||||
|
||||
const { exitCode, stdout } = run();
|
||||
expect(exitCode).toBe(0);
|
||||
expect(stdout).toBe('JUST_UPGRADED 0.3.3 0.4.0');
|
||||
const cache = readFileSync(join(stateDir, 'last-update-check'), 'utf-8');
|
||||
expect(cache).toContain('UP_TO_DATE');
|
||||
});
|
||||
|
||||
// ─── Path D1: Fresh cache, UP_TO_DATE ───────────────────────
|
||||
test('exits silently when cache says UP_TO_DATE and is fresh', () => {
|
||||
writeFileSync(join(gstackDir, 'VERSION'), '0.3.3\n');
|
||||
|
||||
@@ -0,0 +1,226 @@
|
||||
/**
|
||||
* Layer 3: Sidebar agent round-trip tests.
|
||||
* Starts server + sidebar-agent together. Mocks the `claude` binary with a shell
|
||||
* script that outputs canned stream-json. Verifies events flow end-to-end:
|
||||
* POST /sidebar-command → queue → sidebar-agent → mock claude → events → /sidebar-chat
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
let serverProc: Subprocess | null = null;
|
||||
let agentProc: Subprocess | null = null;
|
||||
let serverPort: number = 0;
|
||||
let authToken: string = '';
|
||||
let tmpDir: string = '';
|
||||
let stateFile: string = '';
|
||||
let queueFile: string = '';
|
||||
let mockBinDir: string = '';
|
||||
|
||||
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
...(opts.headers as Record<string, string> || {}),
|
||||
};
|
||||
if (!headers['Authorization'] && authToken) {
|
||||
headers['Authorization'] = `Bearer ${authToken}`;
|
||||
}
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
||||
}
|
||||
|
||||
async function resetState() {
|
||||
await api('/sidebar-session/new', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
}
|
||||
|
||||
async function pollChatUntil(
|
||||
predicate: (entries: any[]) => boolean,
|
||||
timeoutMs = 10000,
|
||||
): Promise<any[]> {
|
||||
const deadline = Date.now() + timeoutMs;
|
||||
while (Date.now() < deadline) {
|
||||
const resp = await api('/sidebar-chat?after=0');
|
||||
const data = await resp.json();
|
||||
if (predicate(data.entries)) return data.entries;
|
||||
await new Promise(r => setTimeout(r, 300));
|
||||
}
|
||||
// Return whatever we have on timeout
|
||||
const resp = await api('/sidebar-chat?after=0');
|
||||
return (await resp.json()).entries;
|
||||
}
|
||||
|
||||
function writeMockClaude(script: string) {
|
||||
const mockPath = path.join(mockBinDir, 'claude');
|
||||
fs.writeFileSync(mockPath, script, { mode: 0o755 });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-roundtrip-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
mockBinDir = path.join(tmpDir, 'bin');
|
||||
fs.mkdirSync(mockBinDir, { recursive: true });
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
// Write default mock claude that outputs canned events
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"system","session_id":"mock-session-123"}'
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"I can see the page. It looks like a test fixture."}]}}'
|
||||
echo '{"type":"result","result":"Done."}'
|
||||
`);
|
||||
|
||||
// Start server (no browser)
|
||||
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1',
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Wait for server
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
|
||||
// Start sidebar-agent with mock claude on PATH
|
||||
const agentScript = path.resolve(__dirname, '..', 'src', 'sidebar-agent.ts');
|
||||
agentProc = spawn(['bun', 'run', agentScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
PATH: `${mockBinDir}:${process.env.PATH}`,
|
||||
BROWSE_SERVER_PORT: String(serverPort),
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
SIDEBAR_AGENT_TIMEOUT: '10000',
|
||||
BROWSE_BIN: 'browse', // doesn't matter, mock claude doesn't use it
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Give sidebar-agent time to start polling
|
||||
await new Promise(r => setTimeout(r, 1000));
|
||||
}, 20000);
|
||||
|
||||
afterAll(() => {
|
||||
if (agentProc) { try { agentProc.kill(); } catch {} }
|
||||
if (serverProc) { try { serverProc.kill(); } catch {} }
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
describe('sidebar-agent round-trip', () => {
|
||||
test('full message round-trip with mock claude', async () => {
|
||||
await resetState();
|
||||
|
||||
// Send a command
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'what is on this page?',
|
||||
activeTabUrl: 'https://example.com/test',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Wait for mock claude to process and events to arrive
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.some((e: any) => e.type === 'agent_done'),
|
||||
15000,
|
||||
);
|
||||
|
||||
// Verify the flow: user message → agent_start → text → agent_done
|
||||
const userEntry = entries.find((e: any) => e.role === 'user');
|
||||
expect(userEntry).toBeDefined();
|
||||
expect(userEntry.message).toBe('what is on this page?');
|
||||
|
||||
// The mock claude outputs text — check for any agent text entry
|
||||
const textEntries = entries.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'));
|
||||
expect(textEntries.length).toBeGreaterThan(0);
|
||||
|
||||
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
|
||||
expect(doneEntry).toBeDefined();
|
||||
|
||||
// Agent should be back to idle
|
||||
const session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
}, 20000);
|
||||
|
||||
test('claude crash produces agent_error', async () => {
|
||||
await resetState();
|
||||
|
||||
// Replace mock claude with one that crashes
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"system","session_id":"crash-test"}' >&2
|
||||
exit 1
|
||||
`);
|
||||
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'crash test' }),
|
||||
});
|
||||
|
||||
// Wait for agent_done (sidebar-agent sends agent_done even on crash via proc.on('close'))
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.some((e: any) => e.type === 'agent_done' || e.type === 'agent_error'),
|
||||
15000,
|
||||
);
|
||||
|
||||
// Agent should recover to idle
|
||||
const session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
|
||||
// Restore working mock
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"recovered"}]}}'
|
||||
`);
|
||||
}, 20000);
|
||||
|
||||
test('sequential queue drain', async () => {
|
||||
await resetState();
|
||||
|
||||
// Restore working mock
|
||||
writeMockClaude(`#!/bin/bash
|
||||
echo '{"type":"assistant","message":{"content":[{"type":"text","text":"response to: '"'"'$*'"'"'"}]}}'
|
||||
`);
|
||||
|
||||
// Send two messages rapidly — first processes, second queues
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'first message' }),
|
||||
});
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'second message' }),
|
||||
});
|
||||
|
||||
// Wait for both to complete (two agent_done events)
|
||||
const entries = await pollChatUntil(
|
||||
(entries) => entries.filter((e: any) => e.type === 'agent_done').length >= 2,
|
||||
20000,
|
||||
);
|
||||
|
||||
// Both user messages should be in chat
|
||||
const userEntries = entries.filter((e: any) => e.role === 'user');
|
||||
expect(userEntries.length).toBeGreaterThanOrEqual(2);
|
||||
}, 25000);
|
||||
});
|
||||
@@ -0,0 +1,320 @@
|
||||
/**
|
||||
* Layer 2: Server HTTP integration tests for sidebar endpoints.
|
||||
* Starts the browse server as a subprocess (no browser via BROWSE_HEADLESS_SKIP),
|
||||
* exercises sidebar HTTP endpoints with fetch(). No Chrome, no Claude, no sidebar-agent.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll, beforeEach } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
|
||||
let serverProc: Subprocess | null = null;
|
||||
let serverPort: number = 0;
|
||||
let authToken: string = '';
|
||||
let tmpDir: string = '';
|
||||
let stateFile: string = '';
|
||||
let queueFile: string = '';
|
||||
|
||||
async function api(pathname: string, opts: RequestInit & { noAuth?: boolean } = {}): Promise<Response> {
|
||||
const { noAuth, ...fetchOpts } = opts;
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
...(fetchOpts.headers as Record<string, string> || {}),
|
||||
};
|
||||
if (!noAuth && !headers['Authorization'] && authToken) {
|
||||
headers['Authorization'] = `Bearer ${authToken}`;
|
||||
}
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...fetchOpts, headers });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-integ-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
|
||||
// Ensure queue dir exists
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
const serverScript = path.resolve(__dirname, '..', 'src', 'server.ts');
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1',
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
// Wait for state file
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
}, 20000);
|
||||
|
||||
afterAll(() => {
|
||||
if (serverProc) { try { serverProc.kill(); } catch {} }
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
// Reset state between tests — creates a fresh session, clears all queues
|
||||
async function resetState() {
|
||||
await api('/sidebar-session/new', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
}
|
||||
|
||||
describe('sidebar auth', () => {
|
||||
test('rejects request without auth token', async () => {
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
noAuth: true,
|
||||
body: JSON.stringify({ message: 'test' }),
|
||||
});
|
||||
expect(resp.status).toBe(401);
|
||||
});
|
||||
|
||||
test('rejects request with wrong token', async () => {
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
headers: { 'Authorization': 'Bearer wrong-token' },
|
||||
body: JSON.stringify({ message: 'test' }),
|
||||
});
|
||||
expect(resp.status).toBe(401);
|
||||
});
|
||||
|
||||
test('accepts request with correct token', async () => {
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'hello' }),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
// Clean up
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
});
|
||||
|
||||
describe('sidebar-command → queue', () => {
|
||||
test('writes queue entry with activeTabUrl', async () => {
|
||||
await resetState();
|
||||
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'what is on this page?',
|
||||
activeTabUrl: 'https://example.com/test-page',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
const data = await resp.json();
|
||||
expect(data.ok).toBe(true);
|
||||
|
||||
// Give server a moment to write queue
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
|
||||
const content = fs.readFileSync(queueFile, 'utf-8').trim();
|
||||
const lines = content.split('\n').filter(Boolean);
|
||||
expect(lines.length).toBeGreaterThan(0);
|
||||
const entry = JSON.parse(lines[lines.length - 1]);
|
||||
expect(entry.pageUrl).toBe('https://example.com/test-page');
|
||||
expect(entry.prompt).toContain('https://example.com/test-page');
|
||||
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
|
||||
test('falls back when activeTabUrl is null', async () => {
|
||||
await resetState();
|
||||
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'test', activeTabUrl: null }),
|
||||
});
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
|
||||
const lines = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
||||
expect(lines.length).toBeGreaterThan(0);
|
||||
const entry = JSON.parse(lines[lines.length - 1]);
|
||||
// No browser → playwright URL is 'about:blank'
|
||||
expect(entry.pageUrl).toBe('about:blank');
|
||||
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
|
||||
test('rejects chrome:// activeTabUrl and falls back', async () => {
|
||||
await resetState();
|
||||
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'test', activeTabUrl: 'chrome://extensions' }),
|
||||
});
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
|
||||
const lines = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
||||
expect(lines.length).toBeGreaterThan(0);
|
||||
const entry = JSON.parse(lines[lines.length - 1]);
|
||||
expect(entry.pageUrl).toBe('about:blank');
|
||||
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
|
||||
test('rejects empty message', async () => {
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: '' }),
|
||||
});
|
||||
expect(resp.status).toBe(400);
|
||||
});
|
||||
});
|
||||
|
||||
describe('sidebar-agent/event → chat buffer', () => {
|
||||
test('agent events appear in /sidebar-chat', async () => {
|
||||
await resetState();
|
||||
|
||||
// Post mock agent events using Claude's streaming format
|
||||
await api('/sidebar-agent/event', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
type: 'assistant',
|
||||
message: { content: [{ type: 'text', text: 'Hello from mock agent' }] },
|
||||
}),
|
||||
});
|
||||
|
||||
const chatData = await (await api('/sidebar-chat?after=0')).json();
|
||||
const textEntry = chatData.entries.find((e: any) => e.type === 'text');
|
||||
expect(textEntry).toBeDefined();
|
||||
expect(textEntry.text).toBe('Hello from mock agent');
|
||||
});
|
||||
|
||||
test('agent_done transitions status to idle', async () => {
|
||||
await resetState();
|
||||
// Start a command so agent is processing
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'test' }),
|
||||
});
|
||||
|
||||
// Verify processing
|
||||
let session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('processing');
|
||||
|
||||
// Send agent_done
|
||||
await api('/sidebar-agent/event', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ type: 'agent_done' }),
|
||||
});
|
||||
|
||||
session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
});
|
||||
});
|
||||
|
||||
describe('message queuing', () => {
|
||||
test('queues message when agent is processing', async () => {
|
||||
await resetState();
|
||||
|
||||
// First message starts processing
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'first' }),
|
||||
});
|
||||
|
||||
// Second message gets queued
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'second' }),
|
||||
});
|
||||
const data = await resp.json();
|
||||
expect(data.ok).toBe(true);
|
||||
expect(data.queued).toBe(true);
|
||||
expect(data.position).toBe(1);
|
||||
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
|
||||
test('returns 429 when queue is full', async () => {
|
||||
await resetState();
|
||||
|
||||
// First message starts processing
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'first' }),
|
||||
});
|
||||
|
||||
// Fill queue (max 5)
|
||||
for (let i = 0; i < 5; i++) {
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: `fill-${i}` }),
|
||||
});
|
||||
}
|
||||
|
||||
// 7th message should be rejected
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'overflow' }),
|
||||
});
|
||||
expect(resp.status).toBe(429);
|
||||
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
});
|
||||
});
|
||||
|
||||
describe('chat clear', () => {
|
||||
test('clears chat buffer', async () => {
|
||||
await resetState();
|
||||
// Add some entries
|
||||
await api('/sidebar-agent/event', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ type: 'text', text: 'to be cleared' }),
|
||||
});
|
||||
|
||||
await api('/sidebar-chat/clear', { method: 'POST' });
|
||||
|
||||
const data = await (await api('/sidebar-chat?after=0')).json();
|
||||
expect(data.entries.length).toBe(0);
|
||||
expect(data.total).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe('agent kill', () => {
|
||||
test('kill adds error entry and returns to idle', async () => {
|
||||
await resetState();
|
||||
|
||||
// Start a command so agent is processing
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ message: 'kill me' }),
|
||||
});
|
||||
|
||||
let session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('processing');
|
||||
|
||||
// Kill the agent
|
||||
const killResp = await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
expect(killResp.status).toBe(200);
|
||||
|
||||
// Check chat for error entry
|
||||
const chatData = await (await api('/sidebar-chat?after=0')).json();
|
||||
const errorEntry = chatData.entries.find((e: any) => e.error === 'Killed by user');
|
||||
expect(errorEntry).toBeDefined();
|
||||
|
||||
// Agent should be idle (no queue items to auto-process)
|
||||
session = await (await api('/sidebar-session')).json();
|
||||
expect(session.agent.status).toBe('idle');
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,96 @@
|
||||
/**
|
||||
* Layer 1: Unit tests for sidebar utilities.
|
||||
* Tests pure functions — no server, no processes, no network.
|
||||
*/
|
||||
|
||||
import { describe, test, expect } from 'bun:test';
|
||||
import { sanitizeExtensionUrl } from '../src/sidebar-utils';
|
||||
|
||||
describe('sanitizeExtensionUrl', () => {
|
||||
test('passes valid http URL', () => {
|
||||
expect(sanitizeExtensionUrl('http://example.com')).toBe('http://example.com/');
|
||||
});
|
||||
|
||||
test('passes valid https URL', () => {
|
||||
expect(sanitizeExtensionUrl('https://example.com/page?q=1')).toBe('https://example.com/page?q=1');
|
||||
});
|
||||
|
||||
test('rejects chrome:// URLs', () => {
|
||||
expect(sanitizeExtensionUrl('chrome://extensions')).toBeNull();
|
||||
});
|
||||
|
||||
test('rejects chrome-extension:// URLs', () => {
|
||||
expect(sanitizeExtensionUrl('chrome-extension://abcdef/popup.html')).toBeNull();
|
||||
});
|
||||
|
||||
test('rejects javascript: URLs', () => {
|
||||
expect(sanitizeExtensionUrl('javascript:alert(1)')).toBeNull();
|
||||
});
|
||||
|
||||
test('rejects file:// URLs', () => {
|
||||
expect(sanitizeExtensionUrl('file:///etc/passwd')).toBeNull();
|
||||
});
|
||||
|
||||
test('rejects data: URLs', () => {
|
||||
expect(sanitizeExtensionUrl('data:text/html,<h1>hi</h1>')).toBeNull();
|
||||
});
|
||||
|
||||
test('strips raw control characters from URL', () => {
|
||||
// URL constructor percent-encodes \x00 as %00, which is safe
|
||||
// The regex strips any remaining raw control chars after .href normalization
|
||||
const result = sanitizeExtensionUrl('https://example.com/\x00page\x1f');
|
||||
expect(result).not.toBeNull();
|
||||
expect(result!).not.toMatch(/[\x00-\x1f\x7f]/);
|
||||
});
|
||||
|
||||
test('strips newlines (prompt injection vector)', () => {
|
||||
const result = sanitizeExtensionUrl('https://evil.com/%0AUser:%20ignore');
|
||||
// URL constructor normalizes %0A, control char stripping removes any raw newlines
|
||||
expect(result).not.toBeNull();
|
||||
expect(result!).not.toContain('\n');
|
||||
});
|
||||
|
||||
test('truncates URLs longer than 2048 chars', () => {
|
||||
const longUrl = 'https://example.com/' + 'a'.repeat(3000);
|
||||
const result = sanitizeExtensionUrl(longUrl);
|
||||
expect(result).not.toBeNull();
|
||||
expect(result!.length).toBeLessThanOrEqual(2048);
|
||||
});
|
||||
|
||||
test('returns null for null input', () => {
|
||||
expect(sanitizeExtensionUrl(null)).toBeNull();
|
||||
});
|
||||
|
||||
test('returns null for undefined input', () => {
|
||||
expect(sanitizeExtensionUrl(undefined)).toBeNull();
|
||||
});
|
||||
|
||||
test('returns null for empty string', () => {
|
||||
expect(sanitizeExtensionUrl('')).toBeNull();
|
||||
});
|
||||
|
||||
test('returns null for invalid URL string', () => {
|
||||
expect(sanitizeExtensionUrl('not a url at all')).toBeNull();
|
||||
});
|
||||
|
||||
test('does not crash on weird input', () => {
|
||||
expect(sanitizeExtensionUrl(':///')).toBeNull();
|
||||
expect(sanitizeExtensionUrl(' ')).toBeNull();
|
||||
expect(sanitizeExtensionUrl('\x00\x01\x02')).toBeNull();
|
||||
});
|
||||
|
||||
test('preserves query parameters and fragments', () => {
|
||||
const url = 'https://example.com/search?q=test&page=2#results';
|
||||
expect(sanitizeExtensionUrl(url)).toBe(url);
|
||||
});
|
||||
|
||||
test('preserves port numbers', () => {
|
||||
expect(sanitizeExtensionUrl('http://localhost:3000/api')).toBe('http://localhost:3000/api');
|
||||
});
|
||||
|
||||
test('handles URL with auth (user:pass@host)', () => {
|
||||
const result = sanitizeExtensionUrl('https://user:pass@example.com/');
|
||||
expect(result).not.toBeNull();
|
||||
expect(result).toContain('example.com');
|
||||
});
|
||||
});
|
||||
+11
-3
@@ -428,6 +428,8 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
@@ -436,6 +438,8 @@ If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
@@ -577,7 +581,8 @@ With focus (e.g., "security"):
|
||||
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -645,6 +650,7 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
3. **Plan review auto-detection:** If the user's prompt is about reviewing a plan,
|
||||
or if plan files exist and the user said `/codex` with no arguments:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
|
||||
```
|
||||
If no project-scoped match, fall back to `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1`
|
||||
@@ -676,7 +682,8 @@ If the user passed `--xhigh`, use `"xhigh"` instead of `"medium"`.
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -709,7 +716,8 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec resume <session-id> "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
<same python streaming parser as above, with flush=True on all print() calls>
|
||||
"
|
||||
```
|
||||
|
||||
+11
-3
@@ -88,6 +88,8 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
|
||||
2. Run the review (5-minute timeout):
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
@@ -96,6 +98,8 @@ If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
Use `timeout: 300000` on the Bash call. If the user provided custom instructions
|
||||
(e.g., `/codex review focus on security`), pass them as the prompt argument:
|
||||
```bash
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review "focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
@@ -172,7 +176,8 @@ With focus (e.g., "security"):
|
||||
If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
|
||||
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached --json 2>/dev/null | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -240,6 +245,7 @@ TMPERR=$(mktemp /tmp/codex-err-XXXXXX.txt)
|
||||
3. **Plan review auto-detection:** If the user's prompt is about reviewing a plan,
|
||||
or if plan files exist and the user said `/codex` with no arguments:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.claude/plans/*.md 2>/dev/null | xargs grep -l "$(basename $(pwd))" 2>/dev/null | head -1
|
||||
```
|
||||
If no project-scoped match, fall back to `ls -t ~/.claude/plans/*.md 2>/dev/null | head -1`
|
||||
@@ -271,7 +277,8 @@ If the user passed `--xhigh`, use `"xhigh"` instead of `"medium"`.
|
||||
|
||||
For a **new session:**
|
||||
```bash
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
import sys, json
|
||||
for line in sys.stdin:
|
||||
line = line.strip()
|
||||
@@ -304,7 +311,8 @@ for line in sys.stdin:
|
||||
|
||||
For a **resumed session** (user chose "Continue"):
|
||||
```bash
|
||||
codex exec resume <session-id> "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec resume <session-id> "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached --json 2>"$TMPERR" | PYTHONUNBUFFERED=1 python3 -u -c "
|
||||
<same python streaming parser as above, with flush=True on all print() calls>
|
||||
"
|
||||
```
|
||||
|
||||
+112
-46
@@ -343,21 +343,49 @@ If `NEEDS_SETUP`:
|
||||
2. Run: `cd <SKILL_DIR> && ./setup`
|
||||
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
|
||||
|
||||
## Step 0: Pre-flight cleanup
|
||||
|
||||
Before connecting, kill any stale browse servers and clean up lock files that
|
||||
may have persisted from a crash. This prevents "already connected" false
|
||||
positives and Chromium profile lock conflicts.
|
||||
|
||||
```bash
|
||||
# Kill any existing browse server
|
||||
if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
|
||||
_OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
|
||||
[ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
|
||||
sleep 1
|
||||
[ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
|
||||
rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
|
||||
fi
|
||||
# Clean Chromium profile locks (can persist after crashes)
|
||||
_PROFILE_DIR="$HOME/.gstack/chromium-profile"
|
||||
for _LF in SingletonLock SingletonSocket SingletonCookie; do
|
||||
rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
|
||||
done
|
||||
echo "Pre-flight cleanup done"
|
||||
```
|
||||
|
||||
## Step 1: Connect
|
||||
|
||||
```bash
|
||||
$B connect
|
||||
```
|
||||
|
||||
This launches your system Chrome via Playwright with:
|
||||
- A visible window (headed mode, not headless)
|
||||
- The gstack Chrome extension pre-loaded
|
||||
- A green shimmer line + "gstack" pill so you know which window is controlled
|
||||
This launches Playwright's bundled Chromium in headed mode with:
|
||||
- A visible window you can watch (not your regular Chrome — it stays untouched)
|
||||
- The gstack Chrome extension auto-loaded via `launchPersistentContext`
|
||||
- A golden shimmer line at the top of every page so you know which window is controlled
|
||||
- A sidebar agent process for chat commands
|
||||
|
||||
If Chrome is already running, the server restarts in headed mode with a fresh
|
||||
Chrome instance. Your regular Chrome stays untouched.
|
||||
The `connect` command auto-discovers the extension from the gstack install
|
||||
directory. It always uses port **34567** so the extension can auto-connect.
|
||||
|
||||
After connecting, print the output to the user.
|
||||
After connecting, print the full output to the user. Confirm you see
|
||||
`Mode: headed` in the output.
|
||||
|
||||
If the output shows an error or the mode is not `headed`, run `$B status` and
|
||||
share the output with the user before proceeding.
|
||||
|
||||
## Step 2: Verify
|
||||
|
||||
@@ -365,27 +393,41 @@ After connecting, print the output to the user.
|
||||
$B status
|
||||
```
|
||||
|
||||
Confirm the output shows `Mode: cdp`. Print the port number — the user may need
|
||||
it for the Side Panel.
|
||||
Confirm the output shows `Mode: headed`. Read the port from the state file:
|
||||
|
||||
```bash
|
||||
cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
|
||||
```
|
||||
|
||||
The port should be **34567**. If it's different, note it — the user may need it
|
||||
for the Side Panel.
|
||||
|
||||
Also find the extension path so you can help the user if they need to load it manually:
|
||||
|
||||
```bash
|
||||
_EXT_PATH=""
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
[ -n "$_ROOT" ] && [ -f "$_ROOT/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.claude/skills/gstack/extension"
|
||||
[ -z "$_EXT_PATH" ] && [ -f "$HOME/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.claude/skills/gstack/extension"
|
||||
echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
|
||||
```
|
||||
|
||||
## Step 3: Guide the user to the Side Panel
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> Chrome is launched with gstack control. You should see a green shimmer line at the
|
||||
> top of the Chrome window and a small "gstack" pill in the bottom-right corner.
|
||||
> Chrome is launched with gstack control. You should see Playwright's Chromium
|
||||
> (not your regular Chrome) with a golden shimmer line at the top of the page.
|
||||
>
|
||||
> The Side Panel extension is pre-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in Chrome's toolbar
|
||||
> 2. Click it → find **gstack browse** → click the **pin icon** to pin it
|
||||
> 3. Click the **gstack icon** in the toolbar
|
||||
> 4. Click **Open Side Panel**
|
||||
> The Side Panel extension should be auto-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
|
||||
> already show the gstack icon if the extension loaded successfully
|
||||
> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
|
||||
> 3. Click the pinned **gstack icon** in the toolbar
|
||||
> 4. The Side Panel should open on the right showing a live activity feed
|
||||
>
|
||||
> The Side Panel shows a live feed of every browse command in real time.
|
||||
>
|
||||
> **Port:** The browse server is on port {PORT} — the extension auto-detects it
|
||||
> if you're using the Playwright-controlled Chrome. If the badge stays gray, click
|
||||
> the gstack icon and enter port {PORT} manually.
|
||||
> **Port:** 34567 (auto-detected — the extension connects automatically in the
|
||||
> Playwright-controlled Chrome).
|
||||
|
||||
Options:
|
||||
- A) I can see the Side Panel — let's go!
|
||||
@@ -393,22 +435,34 @@ Options:
|
||||
- C) Something went wrong
|
||||
|
||||
If B: Tell the user:
|
||||
> The extension should be auto-loaded, but Chrome sometimes doesn't show it
|
||||
> immediately. Try:
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for "gstack browse" — it should be listed and enabled
|
||||
> 3. If not listed, click "Load unpacked" → navigate to the extension folder
|
||||
> (press Cmd+Shift+G in the file picker, paste this path):
|
||||
> `{EXTENSION_PATH}`
|
||||
>
|
||||
> Then pin it from the puzzle piece icon and open the Side Panel.
|
||||
|
||||
If C: Run `$B status` and show the output. Check if the server is healthy.
|
||||
> The extension is loaded into Playwright's Chromium at launch time, but
|
||||
> sometimes it doesn't appear immediately. Try these steps:
|
||||
>
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for **"gstack browse"** — it should be listed and enabled
|
||||
> 3. If it's there but not pinned, go back to any page, click the puzzle piece
|
||||
> icon, and pin it
|
||||
> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
|
||||
> - Press **Cmd+Shift+G** in the file picker dialog
|
||||
> - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
|
||||
> - Click **Select**
|
||||
>
|
||||
> After loading, pin it and click the icon to open the Side Panel.
|
||||
>
|
||||
> If the Side Panel badge stays gray (disconnected), click the gstack icon
|
||||
> and enter port **34567** manually.
|
||||
|
||||
If C:
|
||||
|
||||
1. Run `$B status` and show the output
|
||||
2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
|
||||
3. If the server IS healthy but the browser isn't visible, try `$B focus`
|
||||
4. If that fails, ask the user what they see (error message, blank screen, etc.)
|
||||
|
||||
## Step 4: Demo
|
||||
|
||||
After the user confirms the Side Panel is working, run a quick demo so they
|
||||
can see the activity feed in action:
|
||||
After the user confirms the Side Panel is working, run a quick demo:
|
||||
|
||||
```bash
|
||||
$B goto https://news.ycombinator.com
|
||||
@@ -421,7 +475,7 @@ $B snapshot -i
|
||||
```
|
||||
|
||||
Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
|
||||
commands appear in the activity feed. Every command Claude runs will show up here
|
||||
commands appear in the activity feed. Every command Claude runs shows up here
|
||||
in real time."
|
||||
|
||||
## Step 5: Sidebar chat
|
||||
@@ -429,8 +483,9 @@ in real time."
|
||||
After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
> The Side Panel also has a **chat tab**. Try typing a message like "take a
|
||||
> snapshot and describe this page." A child Claude instance will execute your
|
||||
> request in the browser — you'll see the commands appear in the activity feed.
|
||||
> snapshot and describe this page." A sidebar agent (a child Claude instance)
|
||||
> executes your request in the browser — you'll see the commands appear in
|
||||
> the activity feed as they happen.
|
||||
>
|
||||
> The sidebar agent can navigate pages, click buttons, fill forms, and read
|
||||
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
|
||||
@@ -440,17 +495,28 @@ After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
Tell the user:
|
||||
|
||||
> You're all set! Chrome is under Claude's control with the Side Panel showing
|
||||
> live activity and a chat sidebar for direct commands. Here's what you can do:
|
||||
> You're all set! Here's what you can do with the connected Chrome:
|
||||
>
|
||||
> - **Chat in the sidebar** — type natural language instructions and Claude
|
||||
> executes them in the browser
|
||||
> - **Run any browse command** — `$B goto`, `$B click`, `$B snapshot` — and
|
||||
> watch it happen in Chrome + the Side Panel
|
||||
> - **Use /qa or /design-review** — they'll run in the visible Chrome window
|
||||
> instead of headless. No cookie import needed.
|
||||
> - **`$B focus`** — bring Chrome to the foreground anytime
|
||||
> - **`$B disconnect`** — return to headless mode when done
|
||||
> **Watch Claude work in real time:**
|
||||
> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
|
||||
> every action happen in the visible Chrome window + Side Panel feed
|
||||
> - No cookie import needed — the Playwright browser shares its own session
|
||||
>
|
||||
> **Control the browser directly:**
|
||||
> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
|
||||
> agent executes it (e.g., "fill in the login form and submit")
|
||||
> - **Browse commands** — `$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
|
||||
> `$B snapshot -i` — all visible in Chrome + Side Panel
|
||||
>
|
||||
> **Window management:**
|
||||
> - `$B focus` — bring Chrome to the foreground anytime
|
||||
> - `$B disconnect` — close headed Chrome and return to headless mode
|
||||
>
|
||||
> **What skills look like in headed mode:**
|
||||
> - `/qa` runs its full test suite in the visible browser — you see every page
|
||||
> load, every click, every assertion
|
||||
> - `/design-review` takes screenshots in the real browser — same pixels you see
|
||||
> - `/benchmark` measures performance in the headed browser
|
||||
|
||||
Then proceed with whatever the user asked to do. If they didn't specify a task,
|
||||
ask what they'd like to test or browse.
|
||||
|
||||
+112
-46
@@ -23,21 +23,49 @@ You see every click, every navigation, every action in real time.
|
||||
|
||||
{{BROWSE_SETUP}}
|
||||
|
||||
## Step 0: Pre-flight cleanup
|
||||
|
||||
Before connecting, kill any stale browse servers and clean up lock files that
|
||||
may have persisted from a crash. This prevents "already connected" false
|
||||
positives and Chromium profile lock conflicts.
|
||||
|
||||
```bash
|
||||
# Kill any existing browse server
|
||||
if [ -f "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" ]; then
|
||||
_OLD_PID=$(cat "$(git rev-parse --show-toplevel)/.gstack/browse.json" 2>/dev/null | grep -o '"pid":[0-9]*' | grep -o '[0-9]*')
|
||||
[ -n "$_OLD_PID" ] && kill "$_OLD_PID" 2>/dev/null || true
|
||||
sleep 1
|
||||
[ -n "$_OLD_PID" ] && kill -9 "$_OLD_PID" 2>/dev/null || true
|
||||
rm -f "$(git rev-parse --show-toplevel)/.gstack/browse.json"
|
||||
fi
|
||||
# Clean Chromium profile locks (can persist after crashes)
|
||||
_PROFILE_DIR="$HOME/.gstack/chromium-profile"
|
||||
for _LF in SingletonLock SingletonSocket SingletonCookie; do
|
||||
rm -f "$_PROFILE_DIR/$_LF" 2>/dev/null || true
|
||||
done
|
||||
echo "Pre-flight cleanup done"
|
||||
```
|
||||
|
||||
## Step 1: Connect
|
||||
|
||||
```bash
|
||||
$B connect
|
||||
```
|
||||
|
||||
This launches your system Chrome via Playwright with:
|
||||
- A visible window (headed mode, not headless)
|
||||
- The gstack Chrome extension pre-loaded
|
||||
- A green shimmer line + "gstack" pill so you know which window is controlled
|
||||
This launches Playwright's bundled Chromium in headed mode with:
|
||||
- A visible window you can watch (not your regular Chrome — it stays untouched)
|
||||
- The gstack Chrome extension auto-loaded via `launchPersistentContext`
|
||||
- A golden shimmer line at the top of every page so you know which window is controlled
|
||||
- A sidebar agent process for chat commands
|
||||
|
||||
If Chrome is already running, the server restarts in headed mode with a fresh
|
||||
Chrome instance. Your regular Chrome stays untouched.
|
||||
The `connect` command auto-discovers the extension from the gstack install
|
||||
directory. It always uses port **34567** so the extension can auto-connect.
|
||||
|
||||
After connecting, print the output to the user.
|
||||
After connecting, print the full output to the user. Confirm you see
|
||||
`Mode: headed` in the output.
|
||||
|
||||
If the output shows an error or the mode is not `headed`, run `$B status` and
|
||||
share the output with the user before proceeding.
|
||||
|
||||
## Step 2: Verify
|
||||
|
||||
@@ -45,27 +73,41 @@ After connecting, print the output to the user.
|
||||
$B status
|
||||
```
|
||||
|
||||
Confirm the output shows `Mode: cdp`. Print the port number — the user may need
|
||||
it for the Side Panel.
|
||||
Confirm the output shows `Mode: headed`. Read the port from the state file:
|
||||
|
||||
```bash
|
||||
cat "$(git rev-parse --show-toplevel 2>/dev/null)/.gstack/browse.json" 2>/dev/null | grep -o '"port":[0-9]*' | grep -o '[0-9]*'
|
||||
```
|
||||
|
||||
The port should be **34567**. If it's different, note it — the user may need it
|
||||
for the Side Panel.
|
||||
|
||||
Also find the extension path so you can help the user if they need to load it manually:
|
||||
|
||||
```bash
|
||||
_EXT_PATH=""
|
||||
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
|
||||
[ -n "$_ROOT" ] && [ -f "$_ROOT/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$_ROOT/.claude/skills/gstack/extension"
|
||||
[ -z "$_EXT_PATH" ] && [ -f "$HOME/.claude/skills/gstack/extension/manifest.json" ] && _EXT_PATH="$HOME/.claude/skills/gstack/extension"
|
||||
echo "EXTENSION_PATH: ${_EXT_PATH:-NOT FOUND}"
|
||||
```
|
||||
|
||||
## Step 3: Guide the user to the Side Panel
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> Chrome is launched with gstack control. You should see a green shimmer line at the
|
||||
> top of the Chrome window and a small "gstack" pill in the bottom-right corner.
|
||||
> Chrome is launched with gstack control. You should see Playwright's Chromium
|
||||
> (not your regular Chrome) with a golden shimmer line at the top of the page.
|
||||
>
|
||||
> The Side Panel extension is pre-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in Chrome's toolbar
|
||||
> 2. Click it → find **gstack browse** → click the **pin icon** to pin it
|
||||
> 3. Click the **gstack icon** in the toolbar
|
||||
> 4. Click **Open Side Panel**
|
||||
> The Side Panel extension should be auto-loaded. To open it:
|
||||
> 1. Look for the **puzzle piece icon** (Extensions) in the toolbar — it may
|
||||
> already show the gstack icon if the extension loaded successfully
|
||||
> 2. Click the **puzzle piece** → find **gstack browse** → click the **pin icon**
|
||||
> 3. Click the pinned **gstack icon** in the toolbar
|
||||
> 4. The Side Panel should open on the right showing a live activity feed
|
||||
>
|
||||
> The Side Panel shows a live feed of every browse command in real time.
|
||||
>
|
||||
> **Port:** The browse server is on port {PORT} — the extension auto-detects it
|
||||
> if you're using the Playwright-controlled Chrome. If the badge stays gray, click
|
||||
> the gstack icon and enter port {PORT} manually.
|
||||
> **Port:** 34567 (auto-detected — the extension connects automatically in the
|
||||
> Playwright-controlled Chrome).
|
||||
|
||||
Options:
|
||||
- A) I can see the Side Panel — let's go!
|
||||
@@ -73,22 +115,34 @@ Options:
|
||||
- C) Something went wrong
|
||||
|
||||
If B: Tell the user:
|
||||
> The extension should be auto-loaded, but Chrome sometimes doesn't show it
|
||||
> immediately. Try:
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for "gstack browse" — it should be listed and enabled
|
||||
> 3. If not listed, click "Load unpacked" → navigate to the extension folder
|
||||
> (press Cmd+Shift+G in the file picker, paste this path):
|
||||
> `{EXTENSION_PATH}`
|
||||
>
|
||||
> Then pin it from the puzzle piece icon and open the Side Panel.
|
||||
|
||||
If C: Run `$B status` and show the output. Check if the server is healthy.
|
||||
> The extension is loaded into Playwright's Chromium at launch time, but
|
||||
> sometimes it doesn't appear immediately. Try these steps:
|
||||
>
|
||||
> 1. Type `chrome://extensions` in the address bar
|
||||
> 2. Look for **"gstack browse"** — it should be listed and enabled
|
||||
> 3. If it's there but not pinned, go back to any page, click the puzzle piece
|
||||
> icon, and pin it
|
||||
> 4. If it's NOT listed at all, click **"Load unpacked"** and navigate to:
|
||||
> - Press **Cmd+Shift+G** in the file picker dialog
|
||||
> - Paste this path: `{EXTENSION_PATH}` (use the path from Step 2)
|
||||
> - Click **Select**
|
||||
>
|
||||
> After loading, pin it and click the icon to open the Side Panel.
|
||||
>
|
||||
> If the Side Panel badge stays gray (disconnected), click the gstack icon
|
||||
> and enter port **34567** manually.
|
||||
|
||||
If C:
|
||||
|
||||
1. Run `$B status` and show the output
|
||||
2. If the server is not healthy, re-run Step 0 cleanup + Step 1 connect
|
||||
3. If the server IS healthy but the browser isn't visible, try `$B focus`
|
||||
4. If that fails, ask the user what they see (error message, blank screen, etc.)
|
||||
|
||||
## Step 4: Demo
|
||||
|
||||
After the user confirms the Side Panel is working, run a quick demo so they
|
||||
can see the activity feed in action:
|
||||
After the user confirms the Side Panel is working, run a quick demo:
|
||||
|
||||
```bash
|
||||
$B goto https://news.ycombinator.com
|
||||
@@ -101,7 +155,7 @@ $B snapshot -i
|
||||
```
|
||||
|
||||
Tell the user: "Check the Side Panel — you should see the `goto` and `snapshot`
|
||||
commands appear in the activity feed. Every command Claude runs will show up here
|
||||
commands appear in the activity feed. Every command Claude runs shows up here
|
||||
in real time."
|
||||
|
||||
## Step 5: Sidebar chat
|
||||
@@ -109,8 +163,9 @@ in real time."
|
||||
After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
> The Side Panel also has a **chat tab**. Try typing a message like "take a
|
||||
> snapshot and describe this page." A child Claude instance will execute your
|
||||
> request in the browser — you'll see the commands appear in the activity feed.
|
||||
> snapshot and describe this page." A sidebar agent (a child Claude instance)
|
||||
> executes your request in the browser — you'll see the commands appear in
|
||||
> the activity feed as they happen.
|
||||
>
|
||||
> The sidebar agent can navigate pages, click buttons, fill forms, and read
|
||||
> content. Each task gets up to 5 minutes. It runs in an isolated session, so
|
||||
@@ -120,17 +175,28 @@ After the activity feed demo, tell the user about the sidebar chat:
|
||||
|
||||
Tell the user:
|
||||
|
||||
> You're all set! Chrome is under Claude's control with the Side Panel showing
|
||||
> live activity and a chat sidebar for direct commands. Here's what you can do:
|
||||
> You're all set! Here's what you can do with the connected Chrome:
|
||||
>
|
||||
> - **Chat in the sidebar** — type natural language instructions and Claude
|
||||
> executes them in the browser
|
||||
> - **Run any browse command** — `$B goto`, `$B click`, `$B snapshot` — and
|
||||
> watch it happen in Chrome + the Side Panel
|
||||
> - **Use /qa or /design-review** — they'll run in the visible Chrome window
|
||||
> instead of headless. No cookie import needed.
|
||||
> - **`$B focus`** — bring Chrome to the foreground anytime
|
||||
> - **`$B disconnect`** — return to headless mode when done
|
||||
> **Watch Claude work in real time:**
|
||||
> - Run any gstack skill (`/qa`, `/design-review`, `/benchmark`) and watch
|
||||
> every action happen in the visible Chrome window + Side Panel feed
|
||||
> - No cookie import needed — the Playwright browser shares its own session
|
||||
>
|
||||
> **Control the browser directly:**
|
||||
> - **Sidebar chat** — type natural language in the Side Panel and the sidebar
|
||||
> agent executes it (e.g., "fill in the login form and submit")
|
||||
> - **Browse commands** — `$B goto <url>`, `$B click <sel>`, `$B fill <sel> <val>`,
|
||||
> `$B snapshot -i` — all visible in Chrome + Side Panel
|
||||
>
|
||||
> **Window management:**
|
||||
> - `$B focus` — bring Chrome to the foreground anytime
|
||||
> - `$B disconnect` — close headed Chrome and return to headless mode
|
||||
>
|
||||
> **What skills look like in headed mode:**
|
||||
> - `/qa` runs its full test suite in the visible browser — you see every page
|
||||
> load, every click, every assertion
|
||||
> - `/design-review` takes screenshots in the real browser — same pixels you see
|
||||
> - `/benchmark` measures performance in the headed browser
|
||||
|
||||
Then proceed with whatever the user asked to do. If they didn't specify a task,
|
||||
ask what they'd like to test or browse.
|
||||
|
||||
+4
-3
@@ -358,7 +358,7 @@ ls go.mod 2>/dev/null && echo "STACK: Go"
|
||||
ls Cargo.toml 2>/dev/null && echo "STACK: Rust"
|
||||
ls pom.xml build.gradle 2>/dev/null && echo "STACK: JVM"
|
||||
ls composer.json 2>/dev/null && echo "STACK: PHP"
|
||||
ls *.csproj *.sln 2>/dev/null && echo "STACK: .NET"
|
||||
find . -maxdepth 1 \( -name '*.csproj' -o -name '*.sln' \) 2>/dev/null | grep -q . && echo "STACK: .NET"
|
||||
```
|
||||
|
||||
**Framework detection:**
|
||||
@@ -395,7 +395,8 @@ Map what an attacker sees — both code surface and infrastructure surface.
|
||||
|
||||
**Infrastructure surface:**
|
||||
```bash
|
||||
ls .github/workflows/*.yml .github/workflows/*.yaml .gitlab-ci.yml 2>/dev/null | wc -l
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{ find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null; [ -f .gitlab-ci.yml ] && echo .gitlab-ci.yml; } | wc -l
|
||||
find . -maxdepth 4 -name "Dockerfile*" -o -name "docker-compose*.yml" 2>/dev/null
|
||||
find . -maxdepth 4 -name "*.tf" -o -name "*.tfvars" -o -name "kustomization.yaml" 2>/dev/null
|
||||
ls .env .env.* 2>/dev/null
|
||||
@@ -445,7 +446,7 @@ grep -q "^\.env$\|^\.env\.\*" .gitignore 2>/dev/null && echo ".env IS gitignored
|
||||
|
||||
**CI configs with inline secrets (not using secret stores):**
|
||||
```bash
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml .gitlab-ci.yml .circleci/config.yml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null) .gitlab-ci.yml .circleci/config.yml; do
|
||||
[ -f "$f" ] && grep -n "password:\|token:\|secret:\|api_key:" "$f" | grep -v '\${{' | grep -v 'secrets\.'
|
||||
done 2>/dev/null
|
||||
```
|
||||
|
||||
+4
-3
@@ -73,7 +73,7 @@ ls go.mod 2>/dev/null && echo "STACK: Go"
|
||||
ls Cargo.toml 2>/dev/null && echo "STACK: Rust"
|
||||
ls pom.xml build.gradle 2>/dev/null && echo "STACK: JVM"
|
||||
ls composer.json 2>/dev/null && echo "STACK: PHP"
|
||||
ls *.csproj *.sln 2>/dev/null && echo "STACK: .NET"
|
||||
find . -maxdepth 1 \( -name '*.csproj' -o -name '*.sln' \) 2>/dev/null | grep -q . && echo "STACK: .NET"
|
||||
```
|
||||
|
||||
**Framework detection:**
|
||||
@@ -110,7 +110,8 @@ Map what an attacker sees — both code surface and infrastructure surface.
|
||||
|
||||
**Infrastructure surface:**
|
||||
```bash
|
||||
ls .github/workflows/*.yml .github/workflows/*.yaml .gitlab-ci.yml 2>/dev/null | wc -l
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{ find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null; [ -f .gitlab-ci.yml ] && echo .gitlab-ci.yml; } | wc -l
|
||||
find . -maxdepth 4 -name "Dockerfile*" -o -name "docker-compose*.yml" 2>/dev/null
|
||||
find . -maxdepth 4 -name "*.tf" -o -name "*.tfvars" -o -name "kustomization.yaml" 2>/dev/null
|
||||
ls .env .env.* 2>/dev/null
|
||||
@@ -160,7 +161,7 @@ grep -q "^\.env$\|^\.env\.\*" .gitignore 2>/dev/null && echo ".env IS gitignored
|
||||
|
||||
**CI configs with inline secrets (not using secret stores):**
|
||||
```bash
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml .gitlab-ci.yml .circleci/config.yml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null) .gitlab-ci.yml .circleci/config.yml; do
|
||||
[ -f "$f" ] && grep -n "password:\|token:\|secret:\|api_key:" "$f" | grep -v '\${{' | grep -v 'secrets\.'
|
||||
done 2>/dev/null
|
||||
```
|
||||
|
||||
@@ -356,6 +356,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
|
||||
Look for office-hours output:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
ls ~/.gstack/projects/$SLUG/*office-hours* 2>/dev/null | head -5
|
||||
ls .context/*office-hours* .context/attachments/*office-hours* 2>/dev/null | head -5
|
||||
@@ -472,6 +473,7 @@ which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
1. **Codex design voice** (via Bash):
|
||||
```bash
|
||||
TMPERR_DESIGN=$(mktemp /tmp/codex-design-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Given this product context, propose a complete design direction:
|
||||
- Visual thesis: one sentence describing mood, material, and energy
|
||||
- Typography: specific font names (not defaults — no Inter/Roboto/Arial/system) + hex colors
|
||||
@@ -480,7 +482,7 @@ codex exec "Given this product context, propose a complete design direction:
|
||||
- Differentiation: 2 deliberate departures from category norms
|
||||
- Anti-slop: no purple gradients, no 3-column icon grids, no centered everything, no decorative blobs
|
||||
|
||||
Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
```
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
```bash
|
||||
|
||||
@@ -53,6 +53,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
|
||||
Look for office-hours output:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{{SLUG_EVAL}}
|
||||
ls ~/.gstack/projects/$SLUG/*office-hours* 2>/dev/null | head -5
|
||||
ls .context/*office-hours* .context/attachments/*office-hours* 2>/dev/null | head -5
|
||||
|
||||
@@ -401,6 +401,7 @@ If `NEEDS_SETUP`:
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -997,6 +998,7 @@ which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
1. **Codex design voice** (via Bash):
|
||||
```bash
|
||||
TMPERR_DESIGN=$(mktemp /tmp/codex-design-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the frontend source code in this repo. Evaluate against these design hard rules:
|
||||
- Spacing: systematic (design tokens / CSS variables) or magic numbers?
|
||||
- Typography: expressive purposeful fonts or default stacks?
|
||||
@@ -1026,7 +1028,7 @@ HARD REJECTION — flag if ANY apply:
|
||||
6. Carousel with no narrative purpose
|
||||
7. App UI made of stacked cards instead of layout
|
||||
|
||||
Be specific. Reference file:line for every finding." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
Be specific. Reference file:line for every finding." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
```
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
```bash
|
||||
|
||||
@@ -280,7 +280,7 @@ committing.
|
||||
git commit -m "$(cat <<'EOF'
|
||||
docs: update project documentation for vX.Y.Z.W
|
||||
|
||||
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||||
{{CO_AUTHOR_TRAILER}}
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
+17
-11
@@ -194,17 +194,23 @@ chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
|
||||
sendResponse({ error: 'Not connected' });
|
||||
return true;
|
||||
}
|
||||
fetch(`${base}/sidebar-command`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': `Bearer ${authToken}`,
|
||||
},
|
||||
body: JSON.stringify({ message: msg.message }),
|
||||
})
|
||||
.then(r => r.json())
|
||||
.then(data => sendResponse(data))
|
||||
.catch(err => sendResponse({ error: err.message }));
|
||||
// Capture the active tab's URL so the sidebar agent knows what page
|
||||
// the user is actually looking at (Playwright's page.url() can be stale
|
||||
// if the user navigated manually in headed mode).
|
||||
chrome.tabs.query({ active: true, currentWindow: true }, (tabs) => {
|
||||
const activeTabUrl = tabs?.[0]?.url || null;
|
||||
fetch(`${base}/sidebar-command`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'Authorization': `Bearer ${authToken}`,
|
||||
},
|
||||
body: JSON.stringify({ message: msg.message, activeTabUrl }),
|
||||
})
|
||||
.then(r => r.json())
|
||||
.then(data => sendResponse(data))
|
||||
.catch(err => sendResponse({ error: err.message }));
|
||||
});
|
||||
return true;
|
||||
}
|
||||
});
|
||||
|
||||
@@ -470,7 +470,7 @@ else
|
||||
SAVED_HASH=$(cat ~/.gstack/projects/$SLUG/land-deploy-confirmed 2>/dev/null)
|
||||
CURRENT_HASH=$(sed -n '/## Deploy Configuration/,/^## /p' CLAUDE.md 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
# Also hash workflow files that affect deploy behavior
|
||||
WORKFLOW_HASH=$(cat .github/workflows/*deploy* .github/workflows/*cd* 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(find .github/workflows -maxdepth 1 \( -name '*deploy*' -o -name '*cd*' \) 2>/dev/null | xargs cat 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
COMBINED_HASH="${CURRENT_HASH}-${WORKFLOW_HASH}"
|
||||
if [ "$SAVED_HASH" != "$COMBINED_HASH" ] && [ -n "$SAVED_HASH" ]; then
|
||||
echo "CONFIG_CHANGED"
|
||||
@@ -527,7 +527,7 @@ fi
|
||||
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
|
||||
|
||||
# Detect deploy workflows
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "deploy|release|production|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||||
done
|
||||
@@ -613,7 +613,7 @@ grep -i "staging" CLAUDE.md 2>/dev/null | head -3
|
||||
|
||||
2. **GitHub Actions staging workflow:** Check for workflow files with "staging" in the name or content:
|
||||
```bash
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||||
done
|
||||
```
|
||||
@@ -663,7 +663,7 @@ Save the deploy config fingerprint so we can detect future changes:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/projects/$SLUG
|
||||
CURRENT_HASH=$(sed -n '/## Deploy Configuration/,/^## /p' CLAUDE.md 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(cat .github/workflows/*deploy* .github/workflows/*cd* 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(find .github/workflows -maxdepth 1 \( -name '*deploy*' -o -name '*cd*' \) 2>/dev/null | xargs cat 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
echo "${CURRENT_HASH}-${WORKFLOW_HASH}" > ~/.gstack/projects/$SLUG/land-deploy-confirmed
|
||||
```
|
||||
Continue to Step 2.
|
||||
@@ -805,6 +805,7 @@ If tests fail: **BLOCKER.** Cannot merge with failing tests.
|
||||
**E2E tests — check recent results:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
|
||||
```
|
||||
|
||||
@@ -820,6 +821,7 @@ If E2E results exist but have failures: **WARNING — N tests failed.** List the
|
||||
**LLM judge evals — check recent results:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
@@ -1025,7 +1027,7 @@ fi
|
||||
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
|
||||
|
||||
# Detect deploy workflows
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "deploy|release|production|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||||
done
|
||||
|
||||
@@ -113,7 +113,7 @@ else
|
||||
SAVED_HASH=$(cat ~/.gstack/projects/$SLUG/land-deploy-confirmed 2>/dev/null)
|
||||
CURRENT_HASH=$(sed -n '/## Deploy Configuration/,/^## /p' CLAUDE.md 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
# Also hash workflow files that affect deploy behavior
|
||||
WORKFLOW_HASH=$(cat .github/workflows/*deploy* .github/workflows/*cd* 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(find .github/workflows -maxdepth 1 \( -name '*deploy*' -o -name '*cd*' \) 2>/dev/null | xargs cat 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
COMBINED_HASH="${CURRENT_HASH}-${WORKFLOW_HASH}"
|
||||
if [ "$SAVED_HASH" != "$COMBINED_HASH" ] && [ -n "$SAVED_HASH" ]; then
|
||||
echo "CONFIG_CHANGED"
|
||||
@@ -223,7 +223,7 @@ grep -i "staging" CLAUDE.md 2>/dev/null | head -3
|
||||
|
||||
2. **GitHub Actions staging workflow:** Check for workflow files with "staging" in the name or content:
|
||||
```bash
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||||
done
|
||||
```
|
||||
@@ -273,7 +273,7 @@ Save the deploy config fingerprint so we can detect future changes:
|
||||
```bash
|
||||
mkdir -p ~/.gstack/projects/$SLUG
|
||||
CURRENT_HASH=$(sed -n '/## Deploy Configuration/,/^## /p' CLAUDE.md 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(cat .github/workflows/*deploy* .github/workflows/*cd* 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
WORKFLOW_HASH=$(find .github/workflows -maxdepth 1 \( -name '*deploy*' -o -name '*cd*' \) 2>/dev/null | xargs cat 2>/dev/null | shasum -a 256 | cut -d' ' -f1)
|
||||
echo "${CURRENT_HASH}-${WORKFLOW_HASH}" > ~/.gstack/projects/$SLUG/land-deploy-confirmed
|
||||
```
|
||||
Continue to Step 2.
|
||||
@@ -415,6 +415,7 @@ If tests fail: **BLOCKER.** Cannot merge with failing tests.
|
||||
**E2E tests — check recent results:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack-dev/evals/*-e2e-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -20
|
||||
```
|
||||
|
||||
@@ -430,6 +431,7 @@ If E2E results exist but have failures: **WARNING — N tests failed.** List the
|
||||
**LLM judge evals — check recent results:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack-dev/evals/*-llm-judge-*-$(date +%Y-%m-%d)*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
|
||||
@@ -368,6 +368,7 @@ eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
3. Use Grep/Glob to map the codebase areas most relevant to the user's request.
|
||||
4. **List existing design docs for this project:**
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null
|
||||
```
|
||||
If design docs exist, list them: "Prior designs for this project: [titles + dates]"
|
||||
@@ -598,6 +599,7 @@ After the user states the problem (first question in Phase 2A or 2B), search exi
|
||||
|
||||
Extract 3-5 significant keywords from the user's problem statement and grep across design docs:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
grep -li "<keyword1>\|<keyword2>\|<keyword3>" ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null
|
||||
```
|
||||
|
||||
@@ -714,7 +716,8 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
|
||||
|
||||
```bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
@@ -865,7 +868,8 @@ If user chooses A, launch both voices simultaneously:
|
||||
1. **Codex** (via Bash, `model_reasoning_effort="medium"`):
|
||||
```bash
|
||||
TMPERR_SKETCH=$(mktemp /tmp/codex-sketch-XXXXXXXX)
|
||||
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
|
||||
```
|
||||
Use a 5-minute timeout (`timeout: 300000`). After completion: `cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"`
|
||||
|
||||
@@ -907,6 +911,7 @@ DATETIME=$(date +%Y%m%d-%H%M%S)
|
||||
|
||||
**Design lineage:** Before writing, check for existing design docs on this branch:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
PRIOR=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
```
|
||||
If `$PRIOR` exists, the new doc gets a `Supersedes:` field referencing it. This creates a revision chain — you can trace how a design evolved across office hours sessions.
|
||||
|
||||
@@ -48,6 +48,7 @@ Understand the project and the area the user wants to change.
|
||||
3. Use Grep/Glob to map the codebase areas most relevant to the user's request.
|
||||
4. **List existing design docs for this project:**
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null
|
||||
```
|
||||
If design docs exist, list them: "Prior designs for this project: [titles + dates]"
|
||||
@@ -278,6 +279,7 @@ After the user states the problem (first question in Phase 2A or 2B), search exi
|
||||
|
||||
Extract 3-5 significant keywords from the user's problem statement and grep across design docs:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
grep -li "<keyword1>\|<keyword2>\|<keyword3>" ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null
|
||||
```
|
||||
|
||||
@@ -422,6 +424,7 @@ DATETIME=$(date +%Y%m%d-%H%M%S)
|
||||
|
||||
**Design lineage:** Before writing, check for existing design docs on this branch:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
PRIOR=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
```
|
||||
If `$PRIOR` exists, the new doc gets a `Supersedes:` field referencing it. This creates a revision chain — you can trace how a design evolved across office hours sessions.
|
||||
|
||||
+2
-2
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gstack",
|
||||
"version": "0.12.5.0",
|
||||
"version": "0.12.8.1",
|
||||
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
|
||||
"license": "MIT",
|
||||
"type": "module",
|
||||
@@ -8,7 +8,7 @@
|
||||
"browse": "./browse/dist/browse"
|
||||
},
|
||||
"scripts": {
|
||||
"build": "bun run gen:skill-docs && bun run gen:skill-docs --host codex && bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true",
|
||||
"build": "bun run gen:skill-docs; bun run gen:skill-docs --host codex; bun build --compile browse/src/cli.ts --outfile browse/dist/browse && bun build --compile browse/src/find-browse.ts --outfile browse/dist/find-browse && bun build --compile bin/gstack-global-discover.ts --outfile bin/gstack-global-discover && bash browse/scripts/build-node-server.sh && git rev-parse HEAD > browse/dist/.version && rm -f .*.bun-build || true",
|
||||
"gen:skill-docs": "bun run scripts/gen-skill-docs.ts",
|
||||
"dev": "bun run browse/src/cli.ts",
|
||||
"server": "bun run browse/src/server.ts",
|
||||
|
||||
@@ -445,6 +445,7 @@ Then read CLAUDE.md, TODOS.md, and any existing architecture docs.
|
||||
|
||||
**Design doc check:**
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -455,6 +456,7 @@ If a design doc exists (from `/office-hours`), read it. Use it as the source of
|
||||
|
||||
**Handoff note check** (reuses $SLUG and $BRANCH from the design doc check above):
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
HANDOFF=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null | head -1)
|
||||
[ -n "$HANDOFF" ] && echo "HANDOFF_FOUND: $HANDOFF" || echo "NO_HANDOFF"
|
||||
```
|
||||
@@ -509,6 +511,7 @@ If the Read fails (file not found), say:
|
||||
|
||||
After /office-hours completes, re-run the design doc check:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -1091,7 +1094,8 @@ THE PLAN:
|
||||
|
||||
```bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
@@ -1269,6 +1273,7 @@ After producing the Completion Summary, clean up any handoff notes for this bran
|
||||
the review is complete and the context is no longer needed.
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
rm -f ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null || true
|
||||
```
|
||||
|
||||
@@ -105,6 +105,7 @@ Then read CLAUDE.md, TODOS.md, and any existing architecture docs.
|
||||
|
||||
**Design doc check:**
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -115,6 +116,7 @@ If a design doc exists (from `/office-hours`), read it. Use it as the source of
|
||||
|
||||
**Handoff note check** (reuses $SLUG and $BRANCH from the design doc check above):
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
HANDOFF=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null | head -1)
|
||||
[ -n "$HANDOFF" ] && echo "HANDOFF_FOUND: $HANDOFF" || echo "NO_HANDOFF"
|
||||
```
|
||||
@@ -703,6 +705,7 @@ After producing the Completion Summary, clean up any handoff notes for this bran
|
||||
the review is complete and the context is no longer needed.
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{{SLUG_EVAL}}
|
||||
rm -f ~/.gstack/projects/$SLUG/*-$BRANCH-ceo-handoff-*.md 2>/dev/null || true
|
||||
```
|
||||
|
||||
@@ -489,6 +489,7 @@ which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
1. **Codex design voice** (via Bash):
|
||||
```bash
|
||||
TMPERR_DESIGN=$(mktemp /tmp/codex-design-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Read the plan file at [plan-file-path]. Evaluate this plan's UI/UX design against these criteria.
|
||||
|
||||
HARD REJECTION — flag if ANY apply:
|
||||
@@ -514,7 +515,7 @@ HARD RULES — first classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, the
|
||||
- APP UI: Calm surface hierarchy, dense but readable, utility language, minimal chrome
|
||||
- UNIVERSAL: CSS variables for colors, no default font stacks, one job per section, cards earn existence
|
||||
|
||||
For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
```
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
```bash
|
||||
|
||||
@@ -371,6 +371,7 @@ When evaluating architecture, think "boring by default." When reviewing tests, t
|
||||
|
||||
### Design Doc Check
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -420,6 +421,7 @@ If the Read fails (file not found), say:
|
||||
|
||||
After /office-hours completes, re-run the design doc check:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -497,6 +499,7 @@ Before analyzing coverage, detect the project's test framework:
|
||||
2. **If CLAUDE.md has no testing section, auto-detect:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -749,7 +752,8 @@ THE PLAN:
|
||||
|
||||
```bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
@@ -860,6 +864,33 @@ For each new codepath identified in the test review diagram, list one realistic
|
||||
|
||||
If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**.
|
||||
|
||||
### Worktree parallelization strategy
|
||||
|
||||
Analyze the plan's implementation steps for parallel execution opportunities. This helps the user split work across git worktrees (via Claude Code's Agent tool with `isolation: "worktree"` or parallel workspaces).
|
||||
|
||||
**Skip if:** all steps touch the same primary module, or the plan has fewer than 2 independent workstreams. In that case, write: "Sequential implementation, no parallelization opportunity."
|
||||
|
||||
**Otherwise, produce:**
|
||||
|
||||
1. **Dependency table** — for each implementation step/workstream:
|
||||
|
||||
| Step | Modules touched | Depends on |
|
||||
|------|----------------|------------|
|
||||
| (step name) | (directories/modules, NOT specific files) | (other steps, or —) |
|
||||
|
||||
Work at the module/directory level, not file level. Plans describe intent ("add API endpoints"), not specific files. Module-level ("controllers/, models/") is reliable; file-level is guesswork.
|
||||
|
||||
2. **Parallel lanes** — group steps into lanes:
|
||||
- Steps with no shared modules and no dependency go in separate lanes (parallel)
|
||||
- Steps sharing a module directory go in the same lane (sequential)
|
||||
- Steps depending on other steps go in later lanes
|
||||
|
||||
Format: `Lane A: step1 → step2 (sequential, shared models/)` / `Lane B: step3 (independent)`
|
||||
|
||||
3. **Execution order** — which lanes launch in parallel, which wait. Example: "Launch A + B in parallel worktrees. Merge both. Then C."
|
||||
|
||||
4. **Conflict flags** — if two parallel lanes touch the same module directory, flag it: "Lanes X and Y both touch module/ — potential merge conflict. Consider sequential execution or careful coordination."
|
||||
|
||||
### Completion summary
|
||||
At the end of the review, fill in and display this summary so the user can see all findings at a glance:
|
||||
- Step 0: Scope Challenge — ___ (scope accepted as-is / scope reduced per recommendation)
|
||||
@@ -872,6 +903,7 @@ At the end of the review, fill in and display this summary so the user can see a
|
||||
- TODOS.md updates: ___ items proposed to user
|
||||
- Failure modes: ___ critical gaps flagged
|
||||
- Outside voice: ran (codex/claude) / skipped
|
||||
- Parallelization: ___ lanes, ___ parallel / ___ sequential
|
||||
- Lake Score: X/Y recommendations chose complete option
|
||||
|
||||
## Retrospective learning
|
||||
|
||||
@@ -68,6 +68,7 @@ When evaluating architecture, think "boring by default." When reviewing tests, t
|
||||
|
||||
### Design Doc Check
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -196,6 +197,33 @@ For each new codepath identified in the test review diagram, list one realistic
|
||||
|
||||
If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**.
|
||||
|
||||
### Worktree parallelization strategy
|
||||
|
||||
Analyze the plan's implementation steps for parallel execution opportunities. This helps the user split work across git worktrees (via Claude Code's Agent tool with `isolation: "worktree"` or parallel workspaces).
|
||||
|
||||
**Skip if:** all steps touch the same primary module, or the plan has fewer than 2 independent workstreams. In that case, write: "Sequential implementation, no parallelization opportunity."
|
||||
|
||||
**Otherwise, produce:**
|
||||
|
||||
1. **Dependency table** — for each implementation step/workstream:
|
||||
|
||||
| Step | Modules touched | Depends on |
|
||||
|------|----------------|------------|
|
||||
| (step name) | (directories/modules, NOT specific files) | (other steps, or —) |
|
||||
|
||||
Work at the module/directory level, not file level. Plans describe intent ("add API endpoints"), not specific files. Module-level ("controllers/, models/") is reliable; file-level is guesswork.
|
||||
|
||||
2. **Parallel lanes** — group steps into lanes:
|
||||
- Steps with no shared modules and no dependency go in separate lanes (parallel)
|
||||
- Steps sharing a module directory go in the same lane (sequential)
|
||||
- Steps depending on other steps go in later lanes
|
||||
|
||||
Format: `Lane A: step1 → step2 (sequential, shared models/)` / `Lane B: step3 (independent)`
|
||||
|
||||
3. **Execution order** — which lanes launch in parallel, which wait. Example: "Launch A + B in parallel worktrees. Merge both. Then C."
|
||||
|
||||
4. **Conflict flags** — if two parallel lanes touch the same module directory, flag it: "Lanes X and Y both touch module/ — potential merge conflict. Consider sequential execution or careful coordination."
|
||||
|
||||
### Completion summary
|
||||
At the end of the review, fill in and display this summary so the user can see all findings at a glance:
|
||||
- Step 0: Scope Challenge — ___ (scope accepted as-is / scope reduced per recommendation)
|
||||
@@ -208,6 +236,7 @@ At the end of the review, fill in and display this summary so the user can see a
|
||||
- TODOS.md updates: ___ items proposed to user
|
||||
- Failure modes: ___ critical gaps flagged
|
||||
- Outside voice: ran (codex/claude) / skipped
|
||||
- Parallelization: ___ lanes, ___ parallel / ___ sequential
|
||||
- Lake Score: X/Y recommendations chose complete option
|
||||
|
||||
## Retrospective learning
|
||||
|
||||
@@ -375,6 +375,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
|
||||
|
||||
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
||||
```
|
||||
|
||||
@@ -55,6 +55,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
|
||||
|
||||
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{{SLUG_EVAL}}
|
||||
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
||||
```
|
||||
|
||||
@@ -442,6 +442,7 @@ If `NEEDS_SETUP`:
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -604,6 +605,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
|
||||
|
||||
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
||||
```
|
||||
|
||||
@@ -96,6 +96,7 @@ Before falling back to git diff heuristics, check for richer test plan sources:
|
||||
|
||||
1. **Project-scoped test plans:** Check `~/.gstack/projects/` for recent `*-test-plan-*.md` files for this repo
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
{{SLUG_EVAL}}
|
||||
ls -t ~/.gstack/projects/$SLUG/*-test-plan-*.md 2>/dev/null | head -1
|
||||
```
|
||||
|
||||
@@ -629,6 +629,7 @@ Count backward from today — how many consecutive days have at least one commit
|
||||
Before saving the new snapshot, check for prior retro history:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t .context/retros/*.json 2>/dev/null
|
||||
```
|
||||
|
||||
@@ -655,6 +656,7 @@ mkdir -p .context/retros
|
||||
|
||||
Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Count existing retros for today to get next sequence number
|
||||
today=$(date +%Y-%m-%d)
|
||||
existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
@@ -778,6 +780,7 @@ Narrative covering:
|
||||
Check review JSONL logs for plan completion data from /ship runs this period:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
cat ~/.gstack/projects/$SLUG/*-reviews.jsonl 2>/dev/null | grep '"skill":"ship"' | grep '"plan_items_total"' || echo "NO_PLAN_DATA"
|
||||
```
|
||||
@@ -1079,6 +1082,7 @@ Considering the full cross-project picture.
|
||||
### Global Step 8: Load history & compare
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack/retros/global-*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
@@ -1096,6 +1100,7 @@ mkdir -p ~/.gstack/retros
|
||||
|
||||
Determine the next sequence number for today:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
today=$(date +%Y-%m-%d)
|
||||
existing=$(ls ~/.gstack/retros/global-${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
next=$((existing + 1))
|
||||
|
||||
@@ -307,6 +307,7 @@ Count backward from today — how many consecutive days have at least one commit
|
||||
Before saving the new snapshot, check for prior retro history:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t .context/retros/*.json 2>/dev/null
|
||||
```
|
||||
|
||||
@@ -333,6 +334,7 @@ mkdir -p .context/retros
|
||||
|
||||
Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Count existing retros for today to get next sequence number
|
||||
today=$(date +%Y-%m-%d)
|
||||
existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
@@ -456,6 +458,7 @@ Narrative covering:
|
||||
Check review JSONL logs for plan completion data from /ship runs this period:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||||
cat ~/.gstack/projects/$SLUG/*-reviews.jsonl 2>/dev/null | grep '"skill":"ship"' | grep '"plan_items_total"' || echo "NO_PLAN_DATA"
|
||||
```
|
||||
@@ -757,6 +760,7 @@ Considering the full cross-project picture.
|
||||
### Global Step 8: Load history & compare
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
ls -t ~/.gstack/retros/global-*.json 2>/dev/null | head -5
|
||||
```
|
||||
|
||||
@@ -774,6 +778,7 @@ mkdir -p ~/.gstack/retros
|
||||
|
||||
Determine the next sequence number for today:
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
today=$(date +%Y-%m-%d)
|
||||
existing=$(ls ~/.gstack/retros/global-${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||||
next=$((existing + 1))
|
||||
|
||||
+8
-2
@@ -394,6 +394,7 @@ Before reviewing code quality, check: **did they build what was requested — no
|
||||
2. **Content-based search (fallback):** If no plan file is referenced in conversation context, search by content:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
|
||||
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
|
||||
# Search common plan file locations
|
||||
@@ -621,7 +622,8 @@ If Codex is available, run a lightweight design check on the diff:
|
||||
|
||||
```bash
|
||||
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): 1. Brand/product unmistakable in first screen? 2. One strong visual anchor present? 3. Page understandable by scanning headlines only? 4. Each section has one job? 5. Are cards actually necessary? 6. Does motion improve hierarchy or atmosphere? 7. Would design feel premium with all decorative shadows removed? Flag any hard rejections: 1. Generic SaaS card grid as first impression 2. Beautiful image with weak brand 3. Strong headline with no clear action 4. Busy imagery behind text 5. Sections repeating same mood statement 6. Carousel with no narrative purpose 7. App UI made of stacked cards instead of layout 5 most important design findings only. Reference file:line." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): 1. Brand/product unmistakable in first screen? 2. One strong visual anchor present? 3. Page understandable by scanning headlines only? 4. Each section has one job? 5. Are cards actually necessary? 6. Does motion improve hierarchy or atmosphere? 7. Would design feel premium with all decorative shadows removed? Flag any hard rejections: 1. Generic SaaS card grid as first impression 2. Beautiful image with weak brand 3. Strong headline with no clear action 4. Busy imagery behind text 5. Sections repeating same mood statement 6. Carousel with no narrative purpose 7. App UI made of stacked cards instead of layout 5 most important design findings only. Reference file:line." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
@@ -649,6 +651,7 @@ Before analyzing coverage, detect the project's test framework:
|
||||
2. **If CLAUDE.md has no testing section, auto-detect:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -979,7 +982,8 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -1024,6 +1028,8 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ const SKIP = new Set(['node_modules', '.git', 'dist']);
|
||||
|
||||
function subdirs(root: string): string[] {
|
||||
return fs.readdirSync(root, { withFileTypes: true })
|
||||
.filter(d => d.isDirectory() && !SKIP.has(d.name))
|
||||
.filter(d => d.isDirectory() && !d.name.startsWith('.') && !SKIP.has(d.name))
|
||||
.map(d => d.name);
|
||||
}
|
||||
|
||||
|
||||
@@ -911,68 +911,6 @@ Minimum 0 per category.
|
||||
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
|
||||
}
|
||||
|
||||
function generateDesignReviewLite(ctx: TemplateContext): string {
|
||||
const litmusList = OPENAI_LITMUS_CHECKS.map((item, i) => `${i + 1}. ${item}`).join(' ');
|
||||
const rejectionList = OPENAI_HARD_REJECTIONS.map((item, i) => `${i + 1}. ${item}`).join(' ');
|
||||
// Codex block only for Claude host
|
||||
const codexBlock = ctx.host === 'codex' ? '' : `
|
||||
|
||||
7. **Codex design voice** (optional, automatic if available):
|
||||
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
\`\`\`
|
||||
|
||||
If Codex is available, run a lightweight design check on the diff:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_DRL" && rm -f "$TMPERR_DRL"
|
||||
\`\`\`
|
||||
|
||||
**Error handling:** All errors are non-blocking. On auth failure, timeout, or empty response — skip with a brief note and continue.
|
||||
|
||||
Present Codex output under a \`CODEX (design):\` header, merged with the checklist findings above.`;
|
||||
|
||||
return `## Design Review (conditional, diff-scoped)
|
||||
|
||||
Check if the diff touches frontend files using \`gstack-diff-scope\`:
|
||||
|
||||
\`\`\`bash
|
||||
source <(${ctx.paths.binDir}/gstack-diff-scope <base> 2>/dev/null)
|
||||
\`\`\`
|
||||
|
||||
**If \`SCOPE_FRONTEND=false\`:** Skip design review silently. No output.
|
||||
|
||||
**If \`SCOPE_FRONTEND=true\`:**
|
||||
|
||||
1. **Check for DESIGN.md.** If \`DESIGN.md\` or \`design-system.md\` exists in the repo root, read it. All design findings are calibrated against it — patterns blessed in DESIGN.md are not flagged. If not found, use universal design principles.
|
||||
|
||||
2. **Read \`.claude/skills/review/design-checklist.md\`.** If the file cannot be read, skip design review with a note: "Design checklist not found — skipping design review."
|
||||
|
||||
3. **Read each changed frontend file** (full file, not just diff hunks). Frontend files are identified by the patterns listed in the checklist.
|
||||
|
||||
4. **Apply the design checklist** against the changed files. For each item:
|
||||
- **[HIGH] mechanical CSS fix** (\`outline: none\`, \`!important\`, \`font-size < 16px\`): classify as AUTO-FIX
|
||||
- **[HIGH/MEDIUM] design judgment needed**: classify as ASK
|
||||
- **[LOW] intent-based detection**: present as "Possible — verify visually or run /design-review"
|
||||
|
||||
5. **Include findings** in the review output under a "Design Review" header, following the output format in the checklist. Design findings merge with code review findings into the same Fix-First flow.
|
||||
|
||||
6. **Log the result** for the Review Readiness Dashboard:
|
||||
|
||||
\`\`\`bash
|
||||
${ctx.paths.binDir}/gstack-review-log '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M,"commit":"COMMIT"}'
|
||||
\`\`\`
|
||||
|
||||
Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count, COMMIT = output of \`git rev-parse --short HEAD\`.${codexBlock}`;
|
||||
}
|
||||
|
||||
// NOTE: design-checklist.md is a subset of this methodology for code-level detection.
|
||||
// When adding items here, also update review/design-checklist.md, and vice versa.
|
||||
function generateDesignMethodology(_ctx: TemplateContext): string {
|
||||
@@ -2059,450 +1997,6 @@ If a design doc is now found, read it and continue the review.
|
||||
If none was produced (user may have cancelled), proceed with standard review.`;
|
||||
}
|
||||
|
||||
function generateDesignSketch(_ctx: TemplateContext): string {
|
||||
return `## Visual Sketch (UI ideas only)
|
||||
|
||||
If the chosen approach involves user-facing UI (screens, pages, forms, dashboards,
|
||||
or interactive elements), generate a rough wireframe to help the user visualize it.
|
||||
If the idea is backend-only, infrastructure, or has no UI component — skip this
|
||||
section silently.
|
||||
|
||||
**Step 1: Gather design context**
|
||||
|
||||
1. Check if \`DESIGN.md\` exists in the repo root. If it does, read it for design
|
||||
system constraints (colors, typography, spacing, component patterns). Use these
|
||||
constraints in the wireframe.
|
||||
2. Apply core design principles:
|
||||
- **Information hierarchy** — what does the user see first, second, third?
|
||||
- **Interaction states** — loading, empty, error, success, partial
|
||||
- **Edge case paranoia** — what if the name is 47 chars? Zero results? Network fails?
|
||||
- **Subtraction default** — "as little design as possible" (Rams). Every element earns its pixels.
|
||||
- **Design for trust** — every interface element builds or erodes user trust.
|
||||
|
||||
**Step 2: Generate wireframe HTML**
|
||||
|
||||
Generate a single-page HTML file with these constraints:
|
||||
- **Intentionally rough aesthetic** — use system fonts, thin gray borders, no color,
|
||||
hand-drawn-style elements. This is a sketch, not a polished mockup.
|
||||
- Self-contained — no external dependencies, no CDN links, inline CSS only
|
||||
- Show the core interaction flow (1-3 screens/states max)
|
||||
- Include realistic placeholder content (not "Lorem ipsum" — use content that
|
||||
matches the actual use case)
|
||||
- Add HTML comments explaining design decisions
|
||||
|
||||
Write to a temp file:
|
||||
\`\`\`bash
|
||||
SKETCH_FILE="/tmp/gstack-sketch-$(date +%s).html"
|
||||
\`\`\`
|
||||
|
||||
**Step 3: Render and capture**
|
||||
|
||||
\`\`\`bash
|
||||
$B goto "file://$SKETCH_FILE"
|
||||
$B screenshot /tmp/gstack-sketch.png
|
||||
\`\`\`
|
||||
|
||||
If \`$B\` is not available (browse binary not set up), skip the render step. Tell the
|
||||
user: "Visual sketch requires the browse binary. Run the setup script to enable it."
|
||||
|
||||
**Step 4: Present and iterate**
|
||||
|
||||
Show the screenshot to the user. Ask: "Does this feel right? Want to iterate on the layout?"
|
||||
|
||||
If they want changes, regenerate the HTML with their feedback and re-render.
|
||||
If they approve or say "good enough," proceed.
|
||||
|
||||
**Step 5: Include in design doc**
|
||||
|
||||
Reference the wireframe screenshot in the design doc's "Recommended Approach" section.
|
||||
The screenshot file at \`/tmp/gstack-sketch.png\` can be referenced by downstream skills
|
||||
(\`/plan-design-review\`, \`/design-review\`) to see what was originally envisioned.
|
||||
|
||||
**Step 6: Outside design voices** (optional)
|
||||
|
||||
After the wireframe is approved, offer outside design perspectives:
|
||||
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
\`\`\`
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
> "Want outside design perspectives on the chosen approach? Codex proposes a visual thesis, content plan, and interaction ideas. A Claude subagent proposes an alternative aesthetic direction."
|
||||
>
|
||||
> A) Yes — get outside design voices
|
||||
> B) No — proceed without
|
||||
|
||||
If user chooses A, launch both voices simultaneously:
|
||||
|
||||
1. **Codex** (via Bash, \`model_reasoning_effort="medium"\`):
|
||||
\`\`\`bash
|
||||
TMPERR_SKETCH=$(mktemp /tmp/codex-sketch-XXXXXXXX)
|
||||
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
|
||||
\`\`\`
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After completion: \`cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"\`
|
||||
|
||||
2. **Claude subagent** (via Agent tool):
|
||||
"For this product approach, what design direction would you recommend? What aesthetic, typography, and interaction patterns fit? What would make this approach feel inevitable to the user? Be specific — font names, hex colors, spacing values."
|
||||
|
||||
Present Codex output under \`CODEX SAYS (design sketch):\` and subagent output under \`CLAUDE SUBAGENT (design direction):\`.
|
||||
Error handling: all non-blocking. On failure, skip and continue.`;
|
||||
}
|
||||
|
||||
function generateCodexSecondOpinion(ctx: TemplateContext): string {
|
||||
// Codex host: strip entirely — Codex should never invoke itself
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
return `## Phase 3.5: Cross-Model Second Opinion (optional)
|
||||
|
||||
**Binary check first — no question if unavailable:**
|
||||
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
\`\`\`
|
||||
|
||||
If \`CODEX_NOT_AVAILABLE\`: skip Phase 3.5 entirely — no message, no AskUserQuestion. Proceed directly to Phase 4.
|
||||
|
||||
If \`CODEX_AVAILABLE\`: use AskUserQuestion:
|
||||
|
||||
> Want a second opinion from a different AI model? Codex will independently review your problem statement, key answers, premises, and any landscape findings from this session. It hasn't seen this conversation — it gets a structured summary. Usually takes 2-5 minutes.
|
||||
> A) Yes, get a second opinion
|
||||
> B) No, proceed to alternatives
|
||||
|
||||
If B: skip Phase 3.5 entirely. Remember that Codex did NOT run (affects design doc, founder signals, and Phase 4 below).
|
||||
|
||||
**If A: Run the Codex cold read.**
|
||||
|
||||
1. Assemble a structured context block from Phases 1-3:
|
||||
- Mode (Startup or Builder)
|
||||
- Problem statement (from Phase 1)
|
||||
- Key answers from Phase 2A/2B (summarize each Q&A in 1-2 sentences, include verbatim user quotes)
|
||||
- Landscape findings (from Phase 2.75, if search was run)
|
||||
- Agreed premises (from Phase 3)
|
||||
- Codebase context (project name, languages, recent activity)
|
||||
|
||||
2. **Write the assembled prompt to a temp file** (prevents shell injection from user-derived content):
|
||||
|
||||
\`\`\`bash
|
||||
CODEX_PROMPT_FILE=$(mktemp /tmp/gstack-codex-oh-XXXXXXXX.txt)
|
||||
\`\`\`
|
||||
|
||||
Write the full prompt (context block + instructions) to this file. Use the mode-appropriate variant:
|
||||
|
||||
**Startup mode instructions:** "You are an independent technical advisor reading a transcript of a startup brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the STRONGEST version of what this person is trying to build? Steelman it in 2-3 sentences. 2) What is the ONE thing from their answers that reveals the most about what they should actually build? Quote it and explain why. 3) Name ONE agreed premise you think is wrong, and what evidence would prove you right. 4) If you had 48 hours and one engineer to build a prototype, what would you build? Be specific — tech stack, features, what you'd skip. Be direct. Be terse. No preamble."
|
||||
|
||||
**Builder mode instructions:** "You are an independent technical advisor reading a transcript of a builder brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the COOLEST version of this they haven't considered? 2) What's the ONE thing from their answers that reveals what excites them most? Quote it. 3) What existing open source project or tool gets them 50% of the way there — and what's the 50% they'd need to build? 4) If you had a weekend to build this, what would you build first? Be specific. Be direct. No preamble."
|
||||
|
||||
3. Run Codex:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_OH"
|
||||
rm -f "$TMPERR_OH" "$CODEX_PROMPT_FILE"
|
||||
\`\`\`
|
||||
|
||||
**Error handling:** All errors are non-blocking — Codex second opinion is a quality enhancement, not a prerequisite.
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate. Skipping second opinion."
|
||||
- **Timeout:** "Codex timed out after 5 minutes. Skipping second opinion."
|
||||
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>. Skipping second opinion."
|
||||
|
||||
On any error, proceed to Phase 4 — do NOT fall back to a Claude subagent (this is brainstorming, not adversarial review).
|
||||
|
||||
4. **Presentation:**
|
||||
|
||||
\`\`\`
|
||||
SECOND OPINION (Codex):
|
||||
════════════════════════════════════════════════════════════
|
||||
<full codex output, verbatim — do not truncate or summarize>
|
||||
════════════════════════════════════════════════════════════
|
||||
\`\`\`
|
||||
|
||||
5. **Cross-model synthesis:** After presenting Codex output, provide 3-5 bullet synthesis:
|
||||
- Where Claude agrees with Codex
|
||||
- Where Claude disagrees and why
|
||||
- Whether Codex's challenged premise changes Claude's recommendation
|
||||
|
||||
6. **Premise revision check:** If Codex challenged an agreed premise, use AskUserQuestion:
|
||||
|
||||
> Codex challenged premise #{N}: "{premise text}". Their argument: "{reasoning}".
|
||||
> A) Revise this premise based on Codex's input
|
||||
> B) Keep the original premise — proceed to alternatives
|
||||
|
||||
If A: revise the premise and note the revision. If B: proceed (and note that the user defended this premise with reasoning — this is a founder signal if they articulate WHY they disagree, not just dismiss).`;
|
||||
}
|
||||
|
||||
function generateAdversarialStep(ctx: TemplateContext): string {
|
||||
// Codex host: strip entirely — Codex should never invoke itself
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
const isShip = ctx.skillName === 'ship';
|
||||
const stepNum = isShip ? '3.8' : '5.7';
|
||||
|
||||
return `## Step ${stepNum}: Adversarial review (auto-scaled)
|
||||
|
||||
Adversarial review thoroughness scales automatically based on diff size. No configuration needed.
|
||||
|
||||
**Detect diff size and tool availability:**
|
||||
|
||||
\`\`\`bash
|
||||
DIFF_INS=$(git diff origin/<base> --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_DEL=$(git diff origin/<base> --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
|
||||
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
# Respect old opt-out
|
||||
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||||
echo "DIFF_SIZE: $DIFF_TOTAL"
|
||||
echo "OLD_CFG: \${OLD_CFG:-not_set}"
|
||||
\`\`\`
|
||||
|
||||
If \`OLD_CFG\` is \`disabled\`: skip this step silently. Continue to the next step.
|
||||
|
||||
**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
|
||||
|
||||
**Auto-select tier based on diff size:**
|
||||
- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
|
||||
- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
|
||||
- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
|
||||
|
||||
---
|
||||
|
||||
### Medium tier (50–199 lines)
|
||||
|
||||
Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
|
||||
|
||||
**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
|
||||
|
||||
**Codex adversarial:**
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim. This is informational — it never blocks shipping.
|
||||
|
||||
**Error handling:** All errors are non-blocking — adversarial review is a quality enhancement, not a prerequisite.
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate."
|
||||
- **Timeout:** "Codex timed out after 5 minutes."
|
||||
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>."
|
||||
|
||||
On any Codex error, fall back to the Claude adversarial subagent automatically.
|
||||
|
||||
**Claude adversarial subagent** (fallback when Codex unavailable or errored):
|
||||
|
||||
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
|
||||
|
||||
Subagent prompt:
|
||||
"Read the diff for this branch with \`git diff origin/<base>\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
|
||||
|
||||
Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
|
||||
|
||||
If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
|
||||
|
||||
**Persist the review result:**
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"medium","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
\`\`\`
|
||||
Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
|
||||
|
||||
**Cleanup:** Run \`rm -f "$TMPERR_ADV"\` after processing (if Codex was used).
|
||||
|
||||
---
|
||||
|
||||
### Large tier (200+ lines)
|
||||
|
||||
Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
|
||||
|
||||
**1. Codex structured review (if available):**
|
||||
\`\`\`bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
|
||||
Check for \`[P1]\` markers: found → \`GATE: FAIL\`, not found → \`GATE: PASS\`.
|
||||
|
||||
If GATE is FAIL, use AskUserQuestion:
|
||||
\`\`\`
|
||||
Codex found N critical issues in the diff.
|
||||
|
||||
A) Investigate and fix now (recommended)
|
||||
B) Continue — review will still complete
|
||||
\`\`\`
|
||||
|
||||
If A: address the findings${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Re-run \`codex review\` to verify.
|
||||
|
||||
Read stderr for errors (same error handling as medium tier).
|
||||
|
||||
After stderr: \`rm -f "$TMPERR"\`
|
||||
|
||||
**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
|
||||
|
||||
**3. Codex adversarial challenge (if available):** Run \`codex exec\` with the adversarial prompt (same as medium tier).
|
||||
|
||||
If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: \`npm install -g @openai/codex\`"
|
||||
|
||||
**Persist the review result AFTER all passes complete** (not after each sub-step):
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"large","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
\`\`\`
|
||||
Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
|
||||
|
||||
---
|
||||
|
||||
### Cross-model synthesis (medium and large tiers)
|
||||
|
||||
After all passes complete, synthesize findings across all sources:
|
||||
|
||||
\`\`\`
|
||||
ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
|
||||
════════════════════════════════════════════════════════════
|
||||
High confidence (found by multiple sources): [findings agreed on by >1 pass]
|
||||
Unique to Claude structured review: [from earlier step]
|
||||
Unique to Claude adversarial: [from subagent, if ran]
|
||||
Unique to Codex: [from codex adversarial or code review, if ran]
|
||||
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
|
||||
════════════════════════════════════════════════════════════
|
||||
\`\`\`
|
||||
|
||||
High-confidence findings (agreed on by multiple sources) should be prioritized for fixes.
|
||||
|
||||
---`;
|
||||
}
|
||||
|
||||
function generateCodexPlanReview(ctx: TemplateContext): string {
|
||||
// Codex host: strip entirely — Codex should never invoke itself
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
return `## Outside Voice — Independent Plan Challenge (optional, recommended)
|
||||
|
||||
After all review sections are complete, offer an independent second opinion from a
|
||||
different AI system. Two models agreeing on a plan is stronger signal than one model's
|
||||
thorough review.
|
||||
|
||||
**Check tool availability:**
|
||||
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
\`\`\`
|
||||
|
||||
Use AskUserQuestion:
|
||||
|
||||
> "All review sections are complete. Want an outside voice? A different AI system can
|
||||
> give a brutally honest, independent challenge of this plan — logical gaps, feasibility
|
||||
> risks, and blind spots that are hard to catch from inside the review. Takes about 2
|
||||
> minutes."
|
||||
>
|
||||
> RECOMMENDATION: Choose A — an independent second opinion catches structural blind
|
||||
> spots. Two different AI models agreeing on a plan is stronger signal than one model's
|
||||
> thorough review. Completeness: A=9/10, B=7/10.
|
||||
|
||||
Options:
|
||||
- A) Get the outside voice (recommended)
|
||||
- B) Skip — proceed to outputs
|
||||
|
||||
**If B:** Print "Skipping outside voice." and continue to the next section.
|
||||
|
||||
**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file
|
||||
the user pointed this review at, or the branch diff scope). If a CEO plan document
|
||||
was written in Step 0D-POST, read that too — it contains the scope decisions and vision.
|
||||
|
||||
Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB,
|
||||
truncate to the first 30KB and note "Plan truncated for size"):
|
||||
|
||||
"You are a brutally honest technical reviewer examining a development plan that has
|
||||
already been through a multi-section review. Your job is NOT to repeat that review.
|
||||
Instead, find what it missed. Look for: logical gaps and unstated assumptions that
|
||||
survived the review scrutiny, overcomplexity (is there a fundamentally simpler
|
||||
approach the review was too deep in the weeds to see?), feasibility risks the review
|
||||
took for granted, missing dependencies or sequencing issues, and strategic
|
||||
miscalibration (is this the right thing to build at all?). Be direct. Be terse. No
|
||||
compliments. Just the problems.
|
||||
|
||||
THE PLAN:
|
||||
<plan content>"
|
||||
|
||||
**If CODEX_AVAILABLE:**
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_PV"
|
||||
\`\`\`
|
||||
|
||||
Present the full output verbatim:
|
||||
|
||||
\`\`\`
|
||||
CODEX SAYS (plan review — outside voice):
|
||||
════════════════════════════════════════════════════════════
|
||||
<full codex output, verbatim — do not truncate or summarize>
|
||||
════════════════════════════════════════════════════════════
|
||||
\`\`\`
|
||||
|
||||
**Error handling:** All errors are non-blocking — the outside voice is informational.
|
||||
- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \\\`codex login\\\` to authenticate."
|
||||
- Timeout: "Codex timed out after 5 minutes."
|
||||
- Empty response: "Codex returned no response."
|
||||
|
||||
On any Codex error, fall back to the Claude adversarial subagent.
|
||||
|
||||
**If CODEX_NOT_AVAILABLE (or Codex errored):**
|
||||
|
||||
Dispatch via the Agent tool. The subagent has fresh context — genuine independence.
|
||||
|
||||
Subagent prompt: same plan review prompt as above.
|
||||
|
||||
Present findings under an \`OUTSIDE VOICE (Claude subagent):\` header.
|
||||
|
||||
If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs."
|
||||
|
||||
**Cross-model tension:**
|
||||
|
||||
After presenting the outside voice findings, note any points where the outside voice
|
||||
disagrees with the review findings from earlier sections. Flag these as:
|
||||
|
||||
\`\`\`
|
||||
CROSS-MODEL TENSION:
|
||||
[Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]
|
||||
\`\`\`
|
||||
|
||||
For each substantive tension point, auto-propose as a TODO via AskUserQuestion:
|
||||
|
||||
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
|
||||
> argues [Y]. Worth investigating further?"
|
||||
|
||||
Options:
|
||||
- A) Add to TODOS.md
|
||||
- B) Skip — not substantive
|
||||
|
||||
If no tension points exist, note: "No cross-model tension — both reviewers agree."
|
||||
|
||||
**Persist the result:**
|
||||
\`\`\`bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
\`\`\`
|
||||
|
||||
Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist.
|
||||
SOURCE = "codex" if Codex ran, "claude" if subagent ran.
|
||||
|
||||
**Cleanup:** Run \`rm -f "$TMPERR_PV"\` after processing (if Codex was used).
|
||||
|
||||
---`;
|
||||
}
|
||||
|
||||
function generateDeployBootstrap(_ctx: TemplateContext): string {
|
||||
return `\`\`\`bash
|
||||
# Check for persisted deploy config in CLAUDE.md
|
||||
@@ -2539,195 +2033,6 @@ in the decision tree below.
|
||||
If you want to persist deploy settings for future runs, suggest the user run \`/setup-deploy\`.`;
|
||||
}
|
||||
|
||||
// ─── Design Outside Voices (parallel Codex + Claude subagent) ───────
|
||||
|
||||
function generateDesignOutsideVoices(ctx: TemplateContext): string {
|
||||
// Codex host: strip entirely — Codex should never invoke itself
|
||||
if (ctx.host === 'codex') return '';
|
||||
|
||||
const rejectionList = OPENAI_HARD_REJECTIONS.map((item, i) => `${i + 1}. ${item}`).join('\n');
|
||||
const litmusList = OPENAI_LITMUS_CHECKS.map((item, i) => `${i + 1}. ${item}`).join('\n');
|
||||
|
||||
// Skill-specific configuration
|
||||
const isPlanDesignReview = ctx.skillName === 'plan-design-review';
|
||||
const isDesignReview = ctx.skillName === 'design-review';
|
||||
const isDesignConsultation = ctx.skillName === 'design-consultation';
|
||||
|
||||
// Determine opt-in behavior and reasoning effort
|
||||
const isAutomatic = isDesignReview; // design-review runs automatically
|
||||
const reasoningEffort = isDesignConsultation ? 'medium' : 'high'; // creative vs analytical
|
||||
|
||||
// Build skill-specific Codex prompt
|
||||
let codexPrompt: string;
|
||||
let subagentPrompt: string;
|
||||
|
||||
if (isPlanDesignReview) {
|
||||
codexPrompt = `Read the plan file at [plan-file-path]. Evaluate this plan's UI/UX design against these criteria.
|
||||
|
||||
HARD REJECTION — flag if ANY apply:
|
||||
${rejectionList}
|
||||
|
||||
LITMUS CHECKS — answer YES or NO for each:
|
||||
${litmusList}
|
||||
|
||||
HARD RULES — first classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, then flag violations of the matching rule set:
|
||||
- MARKETING: First viewport as one composition, brand-first hierarchy, full-bleed hero, 2-3 intentional motions, composition-first layout
|
||||
- APP UI: Calm surface hierarchy, dense but readable, utility language, minimal chrome
|
||||
- UNIVERSAL: CSS variables for colors, no default font stacks, one job per section, cards earn existence
|
||||
|
||||
For each finding: what's wrong, what will happen if it ships unresolved, and the specific fix. Be opinionated. No hedging.`;
|
||||
|
||||
subagentPrompt = `Read the plan file at [plan-file-path]. You are an independent senior product designer reviewing this plan. You have NOT seen any prior review. Evaluate:
|
||||
|
||||
1. Information hierarchy: what does the user see first, second, third? Is it right?
|
||||
2. Missing states: loading, empty, error, success, partial — which are unspecified?
|
||||
3. User journey: what's the emotional arc? Where does it break?
|
||||
4. Specificity: does the plan describe SPECIFIC UI ("48px Söhne Bold header, #1a1a1a on white") or generic patterns ("clean modern card-based layout")?
|
||||
5. What design decisions will haunt the implementer if left ambiguous?
|
||||
|
||||
For each finding: what's wrong, severity (critical/high/medium), and the fix.`;
|
||||
} else if (isDesignReview) {
|
||||
codexPrompt = `Review the frontend source code in this repo. Evaluate against these design hard rules:
|
||||
- Spacing: systematic (design tokens / CSS variables) or magic numbers?
|
||||
- Typography: expressive purposeful fonts or default stacks?
|
||||
- Color: CSS variables with defined system, or hardcoded hex scattered?
|
||||
- Responsive: breakpoints defined? calc(100svh - header) for heroes? Mobile tested?
|
||||
- A11y: ARIA landmarks, alt text, contrast ratios, 44px touch targets?
|
||||
- Motion: 2-3 intentional animations, or zero / ornamental only?
|
||||
- Cards: used only when card IS the interaction? No decorative card grids?
|
||||
|
||||
First classify as MARKETING/LANDING PAGE vs APP UI vs HYBRID, then apply matching rules.
|
||||
|
||||
LITMUS CHECKS — answer YES/NO:
|
||||
${litmusList}
|
||||
|
||||
HARD REJECTION — flag if ANY apply:
|
||||
${rejectionList}
|
||||
|
||||
Be specific. Reference file:line for every finding.`;
|
||||
|
||||
subagentPrompt = `Review the frontend source code in this repo. You are an independent senior product designer doing a source-code design audit. Focus on CONSISTENCY PATTERNS across files rather than individual violations:
|
||||
- Are spacing values systematic across the codebase?
|
||||
- Is there ONE color system or scattered approaches?
|
||||
- Do responsive breakpoints follow a consistent set?
|
||||
- Is the accessibility approach consistent or spotty?
|
||||
|
||||
For each finding: what's wrong, severity (critical/high/medium), and the file:line.`;
|
||||
} else if (isDesignConsultation) {
|
||||
codexPrompt = `Given this product context, propose a complete design direction:
|
||||
- Visual thesis: one sentence describing mood, material, and energy
|
||||
- Typography: specific font names (not defaults — no Inter/Roboto/Arial/system) + hex colors
|
||||
- Color system: CSS variables for background, surface, primary text, muted text, accent
|
||||
- Layout: composition-first, not component-first. First viewport as poster, not document
|
||||
- Differentiation: 2 deliberate departures from category norms
|
||||
- Anti-slop: no purple gradients, no 3-column icon grids, no centered everything, no decorative blobs
|
||||
|
||||
Be opinionated. Be specific. Do not hedge. This is YOUR design direction — own it.`;
|
||||
|
||||
subagentPrompt = `Given this product context, propose a design direction that would SURPRISE. What would the cool indie studio do that the enterprise UI team wouldn't?
|
||||
- Propose an aesthetic direction, typography stack (specific font names), color palette (hex values)
|
||||
- 2 deliberate departures from category norms
|
||||
- What emotional reaction should the user have in the first 3 seconds?
|
||||
|
||||
Be bold. Be specific. No hedging.`;
|
||||
} else {
|
||||
// Unknown skill — return empty
|
||||
return '';
|
||||
}
|
||||
|
||||
// Build the opt-in section
|
||||
const optInSection = isAutomatic ? `
|
||||
**Automatic:** Outside voices run automatically when Codex is available. No opt-in needed.` : `
|
||||
Use AskUserQuestion:
|
||||
> "Want outside design voices${isPlanDesignReview ? ' before the detailed review' : ''}? Codex evaluates against OpenAI's design hard rules + litmus checks; Claude subagent does an independent ${isDesignConsultation ? 'design direction proposal' : 'completeness review'}."
|
||||
>
|
||||
> A) Yes — run outside design voices
|
||||
> B) No — proceed without
|
||||
|
||||
If user chooses B, skip this step and continue.`;
|
||||
|
||||
// Build the synthesis section
|
||||
const synthesisSection = isPlanDesignReview ? `
|
||||
**Synthesis — Litmus scorecard:**
|
||||
|
||||
\`\`\`
|
||||
DESIGN OUTSIDE VOICES — LITMUS SCORECARD:
|
||||
═══════════════════════════════════════════════════════════════
|
||||
Check Claude Codex Consensus
|
||||
─────────────────────────────────────── ─────── ─────── ─────────
|
||||
1. Brand unmistakable in first screen? — — —
|
||||
2. One strong visual anchor? — — —
|
||||
3. Scannable by headlines only? — — —
|
||||
4. Each section has one job? — — —
|
||||
5. Cards actually necessary? — — —
|
||||
6. Motion improves hierarchy? — — —
|
||||
7. Premium without decorative shadows? — — —
|
||||
─────────────────────────────────────── ─────── ─────── ─────────
|
||||
Hard rejections triggered: — — —
|
||||
═══════════════════════════════════════════════════════════════
|
||||
\`\`\`
|
||||
|
||||
Fill in each cell from the Codex and subagent outputs. CONFIRMED = both agree. DISAGREE = models differ. NOT SPEC'D = not enough info to evaluate.
|
||||
|
||||
**Pass integration (respects existing 7-pass contract):**
|
||||
- Hard rejections → raised as the FIRST items in Pass 1, tagged \`[HARD REJECTION]\`
|
||||
- Litmus DISAGREE items → raised in the relevant pass with both perspectives
|
||||
- Litmus CONFIRMED failures → pre-loaded as known issues in the relevant pass
|
||||
- Passes can skip discovery and go straight to fixing for pre-identified issues` :
|
||||
isDesignConsultation ? `
|
||||
**Synthesis:** Claude main references both Codex and subagent proposals in the Phase 3 proposal. Present:
|
||||
- Areas of agreement between all three voices (Claude main + Codex + subagent)
|
||||
- Genuine divergences as creative alternatives for the user to choose from
|
||||
- "Codex and I agree on X. Codex suggested Y where I'm proposing Z — here's why..."` : `
|
||||
**Synthesis — Litmus scorecard:**
|
||||
|
||||
Use the same scorecard format as /plan-design-review (shown above). Fill in from both outputs.
|
||||
Merge findings into the triage with \`[codex]\` / \`[subagent]\` / \`[cross-model]\` tags.`;
|
||||
|
||||
const escapedCodexPrompt = codexPrompt.replace(/`/g, '\\`').replace(/\$/g, '\\$');
|
||||
|
||||
return `## Design Outside Voices (parallel)
|
||||
${optInSection}
|
||||
|
||||
**Check Codex availability:**
|
||||
\`\`\`bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
\`\`\`
|
||||
|
||||
**If Codex is available**, launch both voices simultaneously:
|
||||
|
||||
1. **Codex design voice** (via Bash):
|
||||
\`\`\`bash
|
||||
TMPERR_DESIGN=$(mktemp /tmp/codex-design-XXXXXXXX)
|
||||
codex exec "${escapedCodexPrompt}" -s read-only -c 'model_reasoning_effort="${reasoningEffort}"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
\`\`\`
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
cat "$TMPERR_DESIGN" && rm -f "$TMPERR_DESIGN"
|
||||
\`\`\`
|
||||
|
||||
2. **Claude design subagent** (via Agent tool):
|
||||
Dispatch a subagent with this prompt:
|
||||
"${subagentPrompt}"
|
||||
|
||||
**Error handling (all non-blocking):**
|
||||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \`codex login\` to authenticate."
|
||||
- **Timeout:** "Codex timed out after 5 minutes."
|
||||
- **Empty response:** "Codex returned no response."
|
||||
- On any Codex error: proceed with Claude subagent output only, tagged \`[single-model]\`.
|
||||
- If Claude subagent also fails: "Outside voices unavailable — continuing with primary review."
|
||||
|
||||
Present Codex output under a \`CODEX SAYS (design ${isPlanDesignReview ? 'critique' : isDesignReview ? 'source audit' : 'direction'}):\` header.
|
||||
Present subagent output under a \`CLAUDE SUBAGENT (design ${isPlanDesignReview ? 'completeness' : isDesignReview ? 'consistency' : 'direction'}):\` header.
|
||||
${synthesisSection}
|
||||
|
||||
**Log the result:**
|
||||
\`\`\`bash
|
||||
${ctx.paths.binDir}/gstack-review-log '{"skill":"design-outside-voices","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||||
\`\`\`
|
||||
Replace STATUS with "clean" or "issues_found", SOURCE with "codex+subagent", "codex-only", "subagent-only", or "unavailable".`;
|
||||
}
|
||||
|
||||
// ─── Design Hard Rules (OpenAI framework + gstack slop blacklist) ───
|
||||
|
||||
function generateDesignHardRules(_ctx: TemplateContext): string {
|
||||
|
||||
@@ -17,7 +17,8 @@ If Codex is available, run a lightweight design check on the diff:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): ${litmusList} Flag any hard rejections: ${rejectionList} 5 most important design findings only. Reference file:line." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
@@ -467,7 +468,8 @@ If user chooses A, launch both voices simultaneously:
|
||||
1. **Codex** (via Bash, \`model_reasoning_effort="medium"\`):
|
||||
\`\`\`bash
|
||||
TMPERR_SKETCH=$(mktemp /tmp/codex-sketch-XXXXXXXX)
|
||||
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "For this product approach, provide: a visual thesis (one sentence — mood, material, energy), a content plan (hero → support → detail → CTA), and 2 interaction ideas that change page feel. Apply beautiful defaults: composition-first, brand-first, cardless, poster not document. Be opinionated." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="medium"' --enable web_search_cached 2>"$TMPERR_SKETCH"
|
||||
\`\`\`
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After completion: \`cat "$TMPERR_SKETCH" && rm -f "$TMPERR_SKETCH"\`
|
||||
|
||||
@@ -636,7 +638,8 @@ which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
1. **Codex design voice** (via Bash):
|
||||
\`\`\`bash
|
||||
TMPERR_DESIGN=$(mktemp /tmp/codex-design-XXXXXXXX)
|
||||
codex exec "${escapedCodexPrompt}" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="${reasoningEffort}"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "${escapedCodexPrompt}" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="${reasoningEffort}"' --enable web_search_cached 2>"$TMPERR_DESIGN"
|
||||
\`\`\`
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
\`\`\`bash
|
||||
|
||||
@@ -12,7 +12,7 @@ import { generateCommandReference, generateSnapshotFlags, generateBrowseSetup }
|
||||
import { generateDesignMethodology, generateDesignHardRules, generateDesignOutsideVoices, generateDesignReviewLite, generateDesignSketch } from './design';
|
||||
import { generateTestBootstrap, generateTestCoverageAuditPlan, generateTestCoverageAuditShip, generateTestCoverageAuditReview } from './testing';
|
||||
import { generateReviewDashboard, generatePlanFileReviewReport, generateSpecReviewLoop, generateBenefitsFrom, generateCodexSecondOpinion, generateAdversarialStep, generateCodexPlanReview, generatePlanCompletionAuditShip, generatePlanCompletionAuditReview, generatePlanVerificationExec } from './review';
|
||||
import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology } from './utility';
|
||||
import { generateSlugEval, generateSlugSetup, generateBaseBranchDetect, generateDeployBootstrap, generateQAMethodology, generateCoAuthorTrailer } from './utility';
|
||||
|
||||
export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
SLUG_EVAL: generateSlugEval,
|
||||
@@ -44,4 +44,5 @@ export const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
|
||||
PLAN_COMPLETION_AUDIT_SHIP: generatePlanCompletionAuditShip,
|
||||
PLAN_COMPLETION_AUDIT_REVIEW: generatePlanCompletionAuditReview,
|
||||
PLAN_VERIFICATION_EXEC: generatePlanVerificationExec,
|
||||
CO_AUTHOR_TRAILER: generateCoAuthorTrailer,
|
||||
};
|
||||
|
||||
@@ -233,6 +233,7 @@ If the Read fails (file not found), say:
|
||||
|
||||
After /${first} completes, re-run the design doc check:
|
||||
\`\`\`bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||||
@@ -292,7 +293,8 @@ Write the full prompt (context block + instructions) to this file. Use the mode-
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_OH"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
@@ -376,7 +378,8 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
\`\`\`
|
||||
|
||||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -421,6 +424,8 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
\`\`\`bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
\`\`\`
|
||||
|
||||
@@ -531,7 +536,8 @@ THE PLAN:
|
||||
|
||||
\`\`\`bash
|
||||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "<prompt>" -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_PV"
|
||||
\`\`\`
|
||||
|
||||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||||
@@ -609,6 +615,7 @@ function generatePlanFileDiscovery(): string {
|
||||
2. **Content-based search (fallback):** If no plan file is referenced in conversation context, search by content:
|
||||
|
||||
\`\`\`bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
|
||||
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
|
||||
# Search common plan file locations
|
||||
|
||||
@@ -6,6 +6,7 @@ export function generateTestBootstrap(_ctx: TemplateContext): string {
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
\`\`\`bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -200,6 +201,7 @@ Before analyzing coverage, detect the project's test framework:
|
||||
2. **If CLAUDE.md has no testing section, auto-detect:**
|
||||
|
||||
\`\`\`bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
|
||||
@@ -72,7 +72,7 @@ fi
|
||||
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
|
||||
|
||||
# Detect deploy workflows
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \\( -name '*.yml' -o -name '*.yaml' \\) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "deploy|release|production|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||||
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
|
||||
done
|
||||
@@ -365,3 +365,10 @@ Minimum 0 per category.
|
||||
11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
|
||||
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
|
||||
}
|
||||
|
||||
export function generateCoAuthorTrailer(ctx: TemplateContext): string {
|
||||
if (ctx.host === 'codex') {
|
||||
return 'Co-Authored-By: OpenAI Codex <noreply@openai.com>';
|
||||
}
|
||||
return 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
|
||||
}
|
||||
|
||||
@@ -349,13 +349,13 @@ Run the platform detection from the deploy bootstrap:
|
||||
[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway"
|
||||
|
||||
# GitHub Actions deploy workflows
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||||
done
|
||||
|
||||
# Project type
|
||||
[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli"
|
||||
ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library"
|
||||
find . -maxdepth 1 -name '*.gemspec' 2>/dev/null | grep -q . && echo "PROJECT_TYPE:library"
|
||||
```
|
||||
|
||||
### Step 3: Platform-specific setup
|
||||
|
||||
@@ -64,13 +64,13 @@ Run the platform detection from the deploy bootstrap:
|
||||
[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway"
|
||||
|
||||
# GitHub Actions deploy workflows
|
||||
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
|
||||
for f in $(find .github/workflows -maxdepth 1 \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null); do
|
||||
[ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
|
||||
done
|
||||
|
||||
# Project type
|
||||
[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli"
|
||||
ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library"
|
||||
find . -maxdepth 1 -name '*.gemspec' 2>/dev/null | grep -q . && echo "PROJECT_TYPE:library"
|
||||
```
|
||||
|
||||
### Step 3: Platform-specific setup
|
||||
|
||||
+9
-2
@@ -515,6 +515,7 @@ git fetch origin <base> && git merge origin/<base> --no-edit
|
||||
**Detect existing test framework and project runtime:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -867,6 +868,7 @@ Before analyzing coverage, detect the project's test framework:
|
||||
2. **If CLAUDE.md has no testing section, auto-detect:**
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
# Detect project runtime
|
||||
[ -f Gemfile ] && echo "RUNTIME:ruby"
|
||||
[ -f package.json ] && echo "RUNTIME:node"
|
||||
@@ -1125,6 +1127,7 @@ Repo: {owner/repo}
|
||||
2. **Content-based search (fallback):** If no plan file is referenced in conversation context, search by content:
|
||||
|
||||
```bash
|
||||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||||
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
|
||||
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
|
||||
# Search common plan file locations
|
||||
@@ -1348,7 +1351,8 @@ If Codex is available, run a lightweight design check on the diff:
|
||||
|
||||
```bash
|
||||
TMPERR_DRL=$(mktemp /tmp/codex-drl-XXXXXXXX)
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): 1. Brand/product unmistakable in first screen? 2. One strong visual anchor present? 3. Page understandable by scanning headlines only? 4. Each section has one job? 5. Are cards actually necessary? 6. Does motion improve hierarchy or atmosphere? 7. Would design feel premium with all decorative shadows removed? Flag any hard rejections: 1. Generic SaaS card grid as first impression 2. Beautiful image with weak brand 3. Strong headline with no clear action 4. Busy imagery behind text 5. Sections repeating same mood statement 6. Carousel with no narrative purpose 7. App UI made of stacked cards instead of layout 5 most important design findings only. Reference file:line." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the git diff on this branch. Run 7 litmus checks (YES/NO each): 1. Brand/product unmistakable in first screen? 2. One strong visual anchor present? 3. Page understandable by scanning headlines only? 4. Each section has one job? 5. Are cards actually necessary? 6. Does motion improve hierarchy or atmosphere? 7. Would design feel premium with all decorative shadows removed? Flag any hard rejections: 1. Generic SaaS card grid as first impression 2. Beautiful image with weak brand 3. Strong headline with no clear action 4. Busy imagery behind text 5. Sections repeating same mood statement 6. Carousel with no narrative purpose 7. App UI made of stacked cards instead of layout 5 most important design findings only. Reference file:line." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_DRL"
|
||||
```
|
||||
|
||||
Use a 5-minute timeout (`timeout: 300000`). After the command completes, read stderr:
|
||||
@@ -1470,7 +1474,8 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
|
||||
|
||||
```bash
|
||||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||||
```
|
||||
|
||||
Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||||
@@ -1515,6 +1520,8 @@ Claude's structured review already ran. Now run **all three remaining passes** f
|
||||
**1. Codex structured review (if available):**
|
||||
```bash
|
||||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||||
_REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
|
||||
cd "$_REPO_ROOT"
|
||||
codex review --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
|
||||
```
|
||||
|
||||
|
||||
+1
-1
@@ -465,7 +465,7 @@ Save this summary — it goes into the PR body in Step 8.
|
||||
git commit -m "$(cat <<'EOF'
|
||||
chore: bump version and changelog (vX.Y.Z.W)
|
||||
|
||||
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||||
{{CO_AUTHOR_TRAILER}}
|
||||
EOF
|
||||
)"
|
||||
```
|
||||
|
||||
@@ -3,6 +3,7 @@ import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
|
||||
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
|
||||
@@ -262,6 +263,43 @@ describe('gen-skill-docs', () => {
|
||||
}
|
||||
});
|
||||
|
||||
test('bash blocks with shell globs are zsh-safe (setopt guard or find)', () => {
|
||||
for (const skill of ALL_SKILLS) {
|
||||
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
||||
const bashBlocks = [...content.matchAll(/```bash\n([\s\S]*?)```/g)].map(m => m[1]);
|
||||
|
||||
for (const block of bashBlocks) {
|
||||
const lines = block.split('\n');
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trimStart();
|
||||
if (trimmed.startsWith('#')) continue;
|
||||
if (!trimmed.includes('*')) continue;
|
||||
// Skip lines where * is inside find -name, git pathspecs, or $(find)
|
||||
if (/\bfind\b/.test(trimmed)) continue;
|
||||
if (/\bgit\b/.test(trimmed)) continue;
|
||||
if (/\$\(find\b/.test(trimmed)) continue;
|
||||
|
||||
// Check 1: "for VAR in <glob>" must use $(find ...) — caught above by the
|
||||
// $(find check, so any surviving for-in with a glob pattern is a violation
|
||||
if (/\bfor\s+\w+\s+in\b/.test(trimmed) && /\*\./.test(trimmed)) {
|
||||
throw new Error(
|
||||
`Unsafe for-in glob in ${skill.dir}/SKILL.md: "${trimmed}". ` +
|
||||
`Use \`for f in $(find ... -name '*.ext')\` for zsh compatibility.`
|
||||
);
|
||||
}
|
||||
|
||||
// Check 2: ls/cat/rm/grep with glob file args must have setopt guard
|
||||
const isGlobCmd = /\b(?:ls|cat|rm|grep)\b/.test(trimmed) &&
|
||||
/(?:\/\*[a-z.*]|\*\.[a-z])/.test(trimmed);
|
||||
if (isGlobCmd) {
|
||||
expect(block).toContain('setopt +o nomatch');
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
test('preamble-using skills have correct skill name in telemetry', () => {
|
||||
const PREAMBLE_SKILLS = [
|
||||
{ dir: '.', name: 'gstack' },
|
||||
@@ -1599,6 +1637,29 @@ describe('setup script validation', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('discover-skills hidden directory filtering', () => {
|
||||
test('discoverTemplates skips dot-prefixed directories', () => {
|
||||
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-discover-'));
|
||||
try {
|
||||
// Create a hidden dir with a template (should be excluded)
|
||||
fs.mkdirSync(path.join(tmpDir, '.hidden'), { recursive: true });
|
||||
fs.writeFileSync(path.join(tmpDir, '.hidden', 'SKILL.md.tmpl'), '---\nname: evil\n---\ntest');
|
||||
// Create a visible dir with a template (should be included)
|
||||
fs.mkdirSync(path.join(tmpDir, 'visible'), { recursive: true });
|
||||
fs.writeFileSync(path.join(tmpDir, 'visible', 'SKILL.md.tmpl'), '---\nname: good\n---\ntest');
|
||||
|
||||
const { discoverTemplates } = require('../scripts/discover-skills');
|
||||
const results = discoverTemplates(tmpDir);
|
||||
const dirs = results.map((r: { tmpl: string }) => r.tmpl);
|
||||
|
||||
expect(dirs).toContain('visible/SKILL.md.tmpl');
|
||||
expect(dirs).not.toContain('.hidden/SKILL.md.tmpl');
|
||||
} finally {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('telemetry', () => {
|
||||
test('generated SKILL.md contains telemetry start block', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||||
@@ -1647,3 +1708,91 @@ describe('telemetry', () => {
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('codex commands must not use inline $(git rev-parse --show-toplevel) for cwd', () => {
|
||||
// Regression test: inline $(git rev-parse --show-toplevel) in codex exec -C
|
||||
// or codex review without cd evaluates in whatever cwd the background shell
|
||||
// inherits, which may be a different project in Conductor workspaces.
|
||||
// The fix is to resolve _REPO_ROOT eagerly at the top of each bash block.
|
||||
|
||||
// Scan all source files that could contain codex commands
|
||||
// Use Bun.Glob to avoid ELOOP from .claude/skills/gstack symlink back to ROOT
|
||||
const tmplGlob = new Bun.Glob('**/*.tmpl');
|
||||
const sourceFiles = [
|
||||
...Array.from(tmplGlob.scanSync({ cwd: ROOT, followSymlinks: false })),
|
||||
...fs.readdirSync(path.join(ROOT, 'scripts/resolvers'))
|
||||
.filter(f => f.endsWith('.ts'))
|
||||
.map(f => `scripts/resolvers/${f}`),
|
||||
'scripts/gen-skill-docs.ts',
|
||||
];
|
||||
|
||||
test('no codex exec command uses inline $(git rev-parse --show-toplevel) in -C flag', () => {
|
||||
const violations: string[] = [];
|
||||
for (const rel of sourceFiles) {
|
||||
const abs = path.join(ROOT, rel);
|
||||
if (!fs.existsSync(abs)) continue;
|
||||
const content = fs.readFileSync(abs, 'utf-8');
|
||||
const lines = content.split('\n');
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const line = lines[i];
|
||||
if (line.includes('codex exec') && line.includes('-C') && line.includes('$(git rev-parse --show-toplevel)')) {
|
||||
violations.push(`${rel}:${i + 1}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
expect(violations).toEqual([]);
|
||||
});
|
||||
|
||||
test('no generated SKILL.md has codex exec with inline $(git rev-parse --show-toplevel) in -C flag', () => {
|
||||
const violations: string[] = [];
|
||||
const skillMdGlob = new Bun.Glob('**/SKILL.md');
|
||||
const skillMdFiles = Array.from(skillMdGlob.scanSync({ cwd: ROOT, followSymlinks: false }));
|
||||
for (const rel of skillMdFiles) {
|
||||
const abs = path.join(ROOT, rel);
|
||||
if (!fs.existsSync(abs)) continue;
|
||||
const content = fs.readFileSync(abs, 'utf-8');
|
||||
const lines = content.split('\n');
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const line = lines[i];
|
||||
if (line.includes('codex exec') && line.includes('-C') && line.includes('$(git rev-parse --show-toplevel)')) {
|
||||
violations.push(`${rel}:${i + 1}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
expect(violations).toEqual([]);
|
||||
});
|
||||
|
||||
test('codex review commands must be preceded by cd "$_REPO_ROOT" (no -C support)', () => {
|
||||
// codex review does not support -C, so the pattern must be:
|
||||
// _REPO_ROOT=$(git rev-parse --show-toplevel) || { ... }
|
||||
// cd "$_REPO_ROOT"
|
||||
// codex review ...
|
||||
// NOT: codex review ... with inline $(git rev-parse --show-toplevel)
|
||||
const allFiles = [
|
||||
...Array.from(tmplGlob.scanSync({ cwd: ROOT, followSymlinks: false })),
|
||||
...Array.from(new Bun.Glob('**/SKILL.md').scanSync({ cwd: ROOT, followSymlinks: false })),
|
||||
...fs.readdirSync(path.join(ROOT, 'scripts/resolvers'))
|
||||
.filter(f => f.endsWith('.ts'))
|
||||
.map(f => `scripts/resolvers/${f}`),
|
||||
'scripts/gen-skill-docs.ts',
|
||||
];
|
||||
const violations: string[] = [];
|
||||
for (const rel of allFiles) {
|
||||
const abs = path.join(ROOT, rel);
|
||||
if (!fs.existsSync(abs)) continue;
|
||||
const content = fs.readFileSync(abs, 'utf-8');
|
||||
const lines = content.split('\n');
|
||||
for (let i = 0; i < lines.length; i++) {
|
||||
const line = lines[i];
|
||||
// Skip non-executable lines (markdown table cells, prose references)
|
||||
if (line.includes('|') && line.includes('`/codex review`')) continue;
|
||||
if (line.includes('`codex review`')) continue;
|
||||
// Check for codex review with inline $(git rev-parse)
|
||||
if (line.includes('codex review') && line.includes('$(git rev-parse --show-toplevel)')) {
|
||||
violations.push(`${rel}:${i + 1} — inline git rev-parse in codex review`);
|
||||
}
|
||||
}
|
||||
}
|
||||
expect(violations).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -141,13 +141,16 @@ export const E2E_TOUCHFILES: Record<string, string[]> = {
|
||||
'benchmark-workflow': ['benchmark/**', 'browse/src/**'],
|
||||
'setup-deploy-workflow': ['setup-deploy/**', 'scripts/gen-skill-docs.ts'],
|
||||
|
||||
// Sidebar agent
|
||||
'sidebar-navigate': ['browse/src/server.ts', 'browse/src/sidebar-agent.ts', 'browse/src/sidebar-utils.ts', 'extension/**'],
|
||||
'sidebar-url-accuracy': ['browse/src/server.ts', 'browse/src/sidebar-agent.ts', 'browse/src/sidebar-utils.ts', 'extension/background.js'],
|
||||
|
||||
// Autoplan
|
||||
'autoplan-core': ['autoplan/**', 'plan-ceo-review/**', 'plan-eng-review/**', 'plan-design-review/**'],
|
||||
|
||||
// Skill routing — journey-stage tests (depend on ALL skill descriptions)
|
||||
'journey-ideation': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'journey-plan-eng': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'journey-think-bigger': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'journey-debug': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'journey-qa': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
'journey-code-review': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'],
|
||||
@@ -262,13 +265,16 @@ export const E2E_TIERS: Record<string, 'gate' | 'periodic'> = {
|
||||
'benchmark-workflow': 'gate',
|
||||
'setup-deploy-workflow': 'gate',
|
||||
|
||||
// Sidebar agent
|
||||
'sidebar-navigate': 'periodic',
|
||||
'sidebar-url-accuracy': 'periodic',
|
||||
|
||||
// Autoplan — periodic (not yet implemented)
|
||||
'autoplan-core': 'periodic',
|
||||
|
||||
// Skill routing — periodic (LLM routing is non-deterministic)
|
||||
'journey-ideation': 'periodic',
|
||||
'journey-plan-eng': 'periodic',
|
||||
'journey-think-bigger': 'periodic',
|
||||
'journey-debug': 'periodic',
|
||||
'journey-qa': 'periodic',
|
||||
'journey-code-review': 'periodic',
|
||||
|
||||
@@ -0,0 +1,77 @@
|
||||
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
||||
import { execSync, ExecSyncOptionsWithStringEncoding } from 'child_process';
|
||||
import * as fs from 'fs';
|
||||
import * as path from 'path';
|
||||
import * as os from 'os';
|
||||
|
||||
const ROOT = path.resolve(import.meta.dir, '..');
|
||||
const BIN = path.join(ROOT, 'bin');
|
||||
|
||||
let tmpDir: string;
|
||||
let slugDir: string;
|
||||
|
||||
function run(input: string, opts: { expectFail?: boolean } = {}): { stdout: string; exitCode: number } {
|
||||
const execOpts: ExecSyncOptionsWithStringEncoding = {
|
||||
cwd: ROOT,
|
||||
env: { ...process.env, GSTACK_HOME: tmpDir },
|
||||
encoding: 'utf-8',
|
||||
timeout: 10000,
|
||||
};
|
||||
try {
|
||||
const stdout = execSync(`${BIN}/gstack-review-log '${input.replace(/'/g, "'\\''")}'`, execOpts).trim();
|
||||
return { stdout, exitCode: 0 };
|
||||
} catch (e: any) {
|
||||
if (opts.expectFail) {
|
||||
return { stdout: e.stderr?.toString() || '', exitCode: e.status || 1 };
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
|
||||
beforeEach(() => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-revlog-'));
|
||||
// gstack-review-log uses gstack-slug which needs a git repo — create the projects dir
|
||||
// with a predictable slug by pre-creating the directory structure
|
||||
slugDir = path.join(tmpDir, 'projects');
|
||||
fs.mkdirSync(slugDir, { recursive: true });
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
describe('gstack-review-log', () => {
|
||||
test('appends valid JSON to review JSONL file', () => {
|
||||
const input = '{"skill":"plan-eng-review","status":"clean"}';
|
||||
const result = run(input);
|
||||
expect(result.exitCode).toBe(0);
|
||||
|
||||
// Find the JSONL file that was written
|
||||
const projectDirs = fs.readdirSync(slugDir);
|
||||
expect(projectDirs.length).toBeGreaterThan(0);
|
||||
const projectDir = path.join(slugDir, projectDirs[0]);
|
||||
const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
|
||||
expect(jsonlFiles.length).toBeGreaterThan(0);
|
||||
|
||||
const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
|
||||
const parsed = JSON.parse(content);
|
||||
expect(parsed.skill).toBe('plan-eng-review');
|
||||
expect(parsed.status).toBe('clean');
|
||||
});
|
||||
|
||||
test('rejects non-JSON input with non-zero exit code', () => {
|
||||
const result = run('not json at all', { expectFail: true });
|
||||
expect(result.exitCode).not.toBe(0);
|
||||
|
||||
// Verify nothing was written
|
||||
const projectDirs = fs.readdirSync(slugDir);
|
||||
if (projectDirs.length > 0) {
|
||||
const projectDir = path.join(slugDir, projectDirs[0]);
|
||||
const jsonlFiles = fs.readdirSync(projectDir).filter(f => f.endsWith('.jsonl'));
|
||||
if (jsonlFiles.length > 0) {
|
||||
const content = fs.readFileSync(path.join(projectDir, jsonlFiles[0]), 'utf-8').trim();
|
||||
expect(content).toBe('');
|
||||
}
|
||||
}
|
||||
});
|
||||
});
|
||||
@@ -45,7 +45,7 @@ describeIfSelected('Skill E2E tests', [
|
||||
4. $B screenshot /tmp/skill-e2e-test.png
|
||||
Report the results of each command.`,
|
||||
workingDirectory: tmpDir,
|
||||
maxTurns: 5,
|
||||
maxTurns: 7,
|
||||
timeout: 60_000,
|
||||
testName: 'browse-basic',
|
||||
runId,
|
||||
|
||||
@@ -340,21 +340,22 @@ Write your findings to ${dir}/review-output.md`,
|
||||
run('git', ['add', 'app.ts'], dir);
|
||||
run('git', ['commit', '-m', 'feat: update to v2'], dir);
|
||||
|
||||
// Copy ship skill
|
||||
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dir, 'ship-SKILL.md'));
|
||||
// Extract only Step 0 (base branch detection) from ship/SKILL.md
|
||||
// (copying the full 1900-line file causes agent context bloat and flaky timeouts)
|
||||
const fullShipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const step0Start = fullShipSkill.indexOf('## Step 0: Detect platform and base branch');
|
||||
const step0End = fullShipSkill.indexOf('## Step 1: Pre-flight');
|
||||
const shipSection = fullShipSkill.slice(step0Start, step0End > step0Start ? step0End : undefined);
|
||||
fs.writeFileSync(path.join(dir, 'ship-SKILL.md'), shipSection);
|
||||
|
||||
const result = await runSkillTest({
|
||||
prompt: `Read ship-SKILL.md for the ship workflow.
|
||||
prompt: `Read ship-SKILL.md. It contains Step 0 (Detect base branch) from the ship workflow.
|
||||
|
||||
Skip the preamble bash block, lake intro, telemetry, and contributor mode sections — go straight to Step 0.
|
||||
Run the base branch detection. Since there is no remote, gh commands will fail — fall back to main.
|
||||
|
||||
Run ONLY Step 0 (Detect base branch) and Step 1 (Pre-flight) from the ship workflow.
|
||||
Since there is no remote, gh commands will fail — fall back to main.
|
||||
Then run git diff and git log against the detected base branch.
|
||||
|
||||
After completing Step 0 and Step 1, STOP. Do NOT proceed to Step 2 or beyond.
|
||||
Do NOT push, create PRs, or modify VERSION/CHANGELOG.
|
||||
|
||||
Write a summary of what you detected to ${dir}/ship-preflight.md including:
|
||||
Write a summary to ${dir}/ship-preflight.md including:
|
||||
- The detected base branch name
|
||||
- The current branch name
|
||||
- The diff stat against the base branch`,
|
||||
@@ -580,8 +581,13 @@ describeIfSelected('Review Dashboard Via Attribution', ['review-dashboard-via'],
|
||||
].join('\n'));
|
||||
fs.chmodSync(path.join(mockBinDir, 'gstack-review-read'), 0o755);
|
||||
|
||||
// Copy ship skill
|
||||
fs.copyFileSync(path.join(ROOT, 'ship', 'SKILL.md'), path.join(dashDir, 'ship-SKILL.md'));
|
||||
// Extract only the Review Readiness Dashboard section from ship/SKILL.md
|
||||
// (copying the full 1900-line file causes agent context bloat and timeouts)
|
||||
const fullSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const dashStart = fullSkill.indexOf('## Review Readiness Dashboard');
|
||||
const dashEnd = fullSkill.indexOf('\n---\n', dashStart);
|
||||
const dashSection = fullSkill.slice(dashStart, dashEnd > dashStart ? dashEnd : undefined);
|
||||
fs.writeFileSync(path.join(dashDir, 'ship-SKILL.md'), dashSection);
|
||||
});
|
||||
|
||||
afterAll(() => {
|
||||
@@ -605,7 +611,7 @@ Skip the preamble, lake intro, telemetry, and all other ship steps.
|
||||
Write the dashboard output to ${dashDir}/dashboard-output.md`,
|
||||
workingDirectory: dashDir,
|
||||
maxTurns: 12,
|
||||
timeout: 90_000,
|
||||
timeout: 180_000,
|
||||
testName: 'review-dashboard-via',
|
||||
runId,
|
||||
});
|
||||
@@ -639,7 +645,7 @@ Write the dashboard output to ${dashDir}/dashboard-output.md`,
|
||||
);
|
||||
// Ship dashboard should not gate when eng review is clear
|
||||
expect(gateQuestions).toHaveLength(0);
|
||||
}, 120_000);
|
||||
}, 240_000);
|
||||
});
|
||||
|
||||
// Module-level afterAll — finalize eval collector after all tests complete
|
||||
|
||||
@@ -0,0 +1,279 @@
|
||||
/**
|
||||
* Layer 4: E2E tests for the sidebar agent.
|
||||
*
|
||||
* sidebar-url-accuracy: Deterministic test that verifies the activeTabUrl fix.
|
||||
* Starts server (no browser), POSTs to /sidebar-command with different activeTabUrl
|
||||
* values, reads the queue file, and verifies the prompt uses the extension URL.
|
||||
* No real Claude needed — this is a fast, cheap, deterministic test.
|
||||
*
|
||||
* sidebar-navigate: Full E2E with real Claude (requires ANTHROPIC_API_KEY).
|
||||
* Starts server + sidebar-agent, sends a message, waits for Claude to respond.
|
||||
* Tests the complete message flow through the queue.
|
||||
*/
|
||||
|
||||
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
||||
import { spawn, type Subprocess } from 'bun';
|
||||
import * as fs from 'fs';
|
||||
import * as os from 'os';
|
||||
import * as path from 'path';
|
||||
import {
|
||||
ROOT,
|
||||
describeIfSelected, testIfSelected,
|
||||
createEvalCollector, finalizeEvalCollector,
|
||||
} from './helpers/e2e-helpers';
|
||||
|
||||
const evalCollector = createEvalCollector('e2e-sidebar');
|
||||
|
||||
// --- Sidebar URL Accuracy (deterministic, no Claude) ---
|
||||
|
||||
describeIfSelected('Sidebar URL accuracy E2E', ['sidebar-url-accuracy'], () => {
|
||||
let serverProc: Subprocess | null = null;
|
||||
let serverPort: number = 0;
|
||||
let authToken: string = '';
|
||||
let tmpDir: string = '';
|
||||
let stateFile: string = '';
|
||||
let queueFile: string = '';
|
||||
|
||||
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
...(opts.headers as Record<string, string> || {}),
|
||||
};
|
||||
if (!headers['Authorization'] && authToken) {
|
||||
headers['Authorization'] = `Bearer ${authToken}`;
|
||||
}
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-e2e-url-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
const serverScript = path.resolve(ROOT, 'browse', 'src', 'server.ts');
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1',
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
}, 20000);
|
||||
|
||||
afterAll(() => {
|
||||
if (serverProc) { try { serverProc.kill(); } catch {} }
|
||||
finalizeEvalCollector(evalCollector);
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
testIfSelected('sidebar-url-accuracy', async () => {
|
||||
// Fresh session
|
||||
await api('/sidebar-session/new', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
|
||||
const extensionUrl = 'https://example.com/user-navigated-here';
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'What page am I on?',
|
||||
activeTabUrl: extensionUrl,
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Wait for queue entry
|
||||
let lastEntry: any = null;
|
||||
const deadline = Date.now() + 5000;
|
||||
while (Date.now() < deadline) {
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
if (!fs.existsSync(queueFile)) continue;
|
||||
const lines = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
||||
if (lines.length > 0) {
|
||||
lastEntry = JSON.parse(lines[lines.length - 1]);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
expect(lastEntry).not.toBeNull();
|
||||
// Extension URL should be used, not the Playwright fallback
|
||||
expect(lastEntry.pageUrl).toBe(extensionUrl);
|
||||
expect(lastEntry.prompt).toContain(extensionUrl);
|
||||
expect(lastEntry.pageUrl).not.toBe('about:blank');
|
||||
|
||||
// Also test: chrome:// URL should be rejected, falling back to about:blank
|
||||
await api('/sidebar-agent/kill', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
|
||||
await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'test',
|
||||
activeTabUrl: 'chrome://settings',
|
||||
}),
|
||||
});
|
||||
await new Promise(r => setTimeout(r, 200));
|
||||
const lines2 = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
||||
if (lines2.length > 0) {
|
||||
const entry2 = JSON.parse(lines2[lines2.length - 1]);
|
||||
expect(entry2.pageUrl).toBe('about:blank');
|
||||
}
|
||||
|
||||
evalCollector?.addTest({
|
||||
name: 'sidebar-url-accuracy', suite: 'Sidebar URL accuracy E2E', tier: 'e2e',
|
||||
passed: true,
|
||||
duration_ms: 0,
|
||||
cost_usd: 0,
|
||||
exit_reason: 'success',
|
||||
});
|
||||
}, 30_000);
|
||||
});
|
||||
|
||||
// --- Sidebar Navigate (real Claude, requires ANTHROPIC_API_KEY) ---
|
||||
|
||||
describeIfSelected('Sidebar navigate E2E', ['sidebar-navigate'], () => {
|
||||
let serverProc: Subprocess | null = null;
|
||||
let agentProc: Subprocess | null = null;
|
||||
let serverPort: number = 0;
|
||||
let authToken: string = '';
|
||||
let tmpDir: string = '';
|
||||
let stateFile: string = '';
|
||||
let queueFile: string = '';
|
||||
|
||||
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/json',
|
||||
...(opts.headers as Record<string, string> || {}),
|
||||
};
|
||||
if (!headers['Authorization'] && authToken) {
|
||||
headers['Authorization'] = `Bearer ${authToken}`;
|
||||
}
|
||||
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
||||
}
|
||||
|
||||
beforeAll(async () => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-e2e-nav-'));
|
||||
stateFile = path.join(tmpDir, 'browse.json');
|
||||
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
||||
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
||||
|
||||
// Start server WITHOUT headless skip — we need a real browser for Claude to use
|
||||
const serverScript = path.resolve(ROOT, 'browse', 'src', 'server.ts');
|
||||
serverProc = spawn(['bun', 'run', serverScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
BROWSE_HEADLESS_SKIP: '1', // Still skip browser — Claude uses curl/fetch instead
|
||||
BROWSE_PORT: '0',
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
BROWSE_IDLE_TIMEOUT: '300',
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
const deadline = Date.now() + 15000;
|
||||
while (Date.now() < deadline) {
|
||||
if (fs.existsSync(stateFile)) {
|
||||
try {
|
||||
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
||||
if (state.port && state.token) {
|
||||
serverPort = state.port;
|
||||
authToken = state.token;
|
||||
break;
|
||||
}
|
||||
} catch {}
|
||||
}
|
||||
await new Promise(r => setTimeout(r, 100));
|
||||
}
|
||||
if (!serverPort) throw new Error('Server did not start in time');
|
||||
|
||||
// Start sidebar-agent
|
||||
const agentScript = path.resolve(ROOT, 'browse', 'src', 'sidebar-agent.ts');
|
||||
agentProc = spawn(['bun', 'run', agentScript], {
|
||||
env: {
|
||||
...process.env,
|
||||
BROWSE_SERVER_PORT: String(serverPort),
|
||||
BROWSE_STATE_FILE: stateFile,
|
||||
SIDEBAR_QUEUE_PATH: queueFile,
|
||||
SIDEBAR_AGENT_TIMEOUT: '90000',
|
||||
BROWSE_BIN: 'echo', // browse commands won't work, but Claude can use curl
|
||||
},
|
||||
stdio: ['ignore', 'pipe', 'pipe'],
|
||||
});
|
||||
|
||||
await new Promise(r => setTimeout(r, 1500));
|
||||
}, 25000);
|
||||
|
||||
afterAll(() => {
|
||||
if (agentProc) { try { agentProc.kill(); } catch {} }
|
||||
if (serverProc) { try { serverProc.kill(); } catch {} }
|
||||
finalizeEvalCollector(evalCollector);
|
||||
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
||||
});
|
||||
|
||||
testIfSelected('sidebar-navigate', async () => {
|
||||
await api('/sidebar-session/new', { method: 'POST' });
|
||||
fs.writeFileSync(queueFile, '');
|
||||
const startTime = Date.now();
|
||||
|
||||
// Ask Claude a simple question — it doesn't need browse commands for this
|
||||
const resp = await api('/sidebar-command', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
message: 'Say exactly "SIDEBAR_TEST_OK" and nothing else.',
|
||||
activeTabUrl: 'https://example.com',
|
||||
}),
|
||||
});
|
||||
expect(resp.status).toBe(200);
|
||||
|
||||
// Poll for agent_done
|
||||
const deadline = Date.now() + 90000;
|
||||
let entries: any[] = [];
|
||||
while (Date.now() < deadline) {
|
||||
const chatResp = await api('/sidebar-chat?after=0');
|
||||
const data = await chatResp.json();
|
||||
entries = data.entries;
|
||||
if (entries.some((e: any) => e.type === 'agent_done')) break;
|
||||
await new Promise(r => setTimeout(r, 2000));
|
||||
}
|
||||
|
||||
const duration = Date.now() - startTime;
|
||||
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
|
||||
expect(doneEntry).toBeDefined();
|
||||
|
||||
// Claude should have responded with something
|
||||
const agentText = entries
|
||||
.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'))
|
||||
.map((e: any) => e.text || '')
|
||||
.join(' ');
|
||||
expect(agentText.length).toBeGreaterThan(0);
|
||||
|
||||
evalCollector?.addTest({
|
||||
name: 'sidebar-navigate', suite: 'Sidebar navigate E2E', tier: 'e2e',
|
||||
passed: !!doneEntry && agentText.length > 0,
|
||||
duration_ms: duration,
|
||||
cost_usd: 0,
|
||||
exit_reason: doneEntry ? 'success' : 'timeout',
|
||||
});
|
||||
}, 120_000);
|
||||
});
|
||||
@@ -250,56 +250,10 @@ describeE2E('Skill Routing E2E — Developer Journey', () => {
|
||||
}
|
||||
}, 150_000);
|
||||
|
||||
testIfSelected('journey-think-bigger', async () => {
|
||||
const tmpDir = createRoutingWorkDir('think-bigger');
|
||||
try {
|
||||
fs.writeFileSync(path.join(tmpDir, 'plan.md'), `# Waitlist App Architecture
|
||||
|
||||
## Components
|
||||
- REST API (Express.js)
|
||||
- PostgreSQL database
|
||||
- React frontend
|
||||
- SMS integration (Twilio)
|
||||
|
||||
## Data Model
|
||||
- restaurants (id, name, settings)
|
||||
- parties (id, restaurant_id, name, size, phone, status, created_at)
|
||||
- wait_estimates (id, restaurant_id, avg_wait_minutes)
|
||||
|
||||
## API Endpoints
|
||||
- POST /api/parties - add party to waitlist
|
||||
- GET /api/parties - list current waitlist
|
||||
- PATCH /api/parties/:id/status - update party status
|
||||
- GET /api/estimate - get current wait estimate
|
||||
`);
|
||||
spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
|
||||
spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 });
|
||||
|
||||
const testName = 'journey-think-bigger';
|
||||
const expectedSkill = 'plan-ceo-review';
|
||||
const result = await runSkillTest({
|
||||
prompt: "Actually, looking at this plan again, I feel like we're thinking too small. We're just doing waitlists but what about the whole restaurant guest experience? Is there a bigger opportunity here we should go after?",
|
||||
workingDirectory: tmpDir,
|
||||
maxTurns: 5,
|
||||
allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'],
|
||||
timeout: 120_000,
|
||||
testName,
|
||||
runId,
|
||||
});
|
||||
|
||||
const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill');
|
||||
const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined;
|
||||
|
||||
logCost(`journey: ${testName}`, result);
|
||||
recordRouting(testName, result, expectedSkill, actualSkill);
|
||||
|
||||
expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0);
|
||||
const validSkills = ['plan-ceo-review', 'office-hours'];
|
||||
expect(validSkills, `Expected one of ${validSkills.join('/')} but got ${actualSkill}`).toContain(actualSkill);
|
||||
} finally {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
}
|
||||
}, 180_000);
|
||||
// Removed: journey-think-bigger
|
||||
// Tested ambiguous routing ("think bigger" → plan-ceo-review) but Claude
|
||||
// legitimately answers directly instead of routing. Never passed reliably.
|
||||
// The other 10 journey tests cover routing with clear signals.
|
||||
|
||||
testIfSelected('journey-debug', async () => {
|
||||
const tmpDir = createRoutingWorkDir('debug');
|
||||
|
||||
@@ -188,6 +188,82 @@ describe('gstack-telemetry-log', () => {
|
||||
expect(events[0]).toHaveProperty('_branch');
|
||||
});
|
||||
|
||||
// ─── json_safe() injection prevention tests ────────────────
|
||||
test('sanitizes skill name with quote injection attempt', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
run(`${BIN}/gstack-telemetry-log --skill 'review","injected":"true' --duration 10 --outcome success --session-id inj-1`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
// Must be valid JSON (no injection — quotes stripped, so no field injection possible)
|
||||
const event = JSON.parse(lines[0]);
|
||||
// The key check: no injected top-level property was created
|
||||
expect(event).not.toHaveProperty('injected');
|
||||
// Skill field should have quotes stripped but content preserved
|
||||
expect(event.skill).not.toContain('"');
|
||||
});
|
||||
|
||||
test('truncates skill name exceeding 200 chars', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
const longSkill = 'a'.repeat(250);
|
||||
run(`${BIN}/gstack-telemetry-log --skill '${longSkill}' --duration 10 --outcome success --session-id trunc-1`);
|
||||
|
||||
const events = parseJsonl();
|
||||
expect(events[0].skill.length).toBeLessThanOrEqual(200);
|
||||
});
|
||||
|
||||
test('sanitizes outcome with newline injection attempt', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
// Use printf to pass actual newline in the argument
|
||||
run(`bash -c 'OUTCOME=$(printf "success\\nfake\\":\\"true"); ${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome "$OUTCOME" --session-id inj-2'`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
const event = JSON.parse(lines[0]);
|
||||
expect(event).not.toHaveProperty('fake');
|
||||
});
|
||||
|
||||
test('sanitizes session_id with backslash-quote injection', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id 'id\\\\"","x":"y'`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
const event = JSON.parse(lines[0]);
|
||||
expect(event).not.toHaveProperty('x');
|
||||
});
|
||||
|
||||
test('sanitizes error_class with quote injection', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-class 'timeout","extra":"val' --session-id inj-3`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
const event = JSON.parse(lines[0]);
|
||||
expect(event).not.toHaveProperty('extra');
|
||||
});
|
||||
|
||||
test('sanitizes failed_step with quote injection', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --failed-step 'step1","hacked":"yes' --session-id inj-4`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
const event = JSON.parse(lines[0]);
|
||||
expect(event).not.toHaveProperty('hacked');
|
||||
});
|
||||
|
||||
test('escapes error_message quotes and preserves content', () => {
|
||||
setConfig('telemetry', 'anonymous');
|
||||
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'Error: file "test.txt" not found' --session-id inj-5`);
|
||||
|
||||
const lines = readJsonl();
|
||||
expect(lines).toHaveLength(1);
|
||||
const event = JSON.parse(lines[0]);
|
||||
expect(event.error_message).toContain('file');
|
||||
expect(event.error_message).toContain('not found');
|
||||
});
|
||||
|
||||
test('creates analytics directory if missing', () => {
|
||||
// Remove analytics dir
|
||||
const analyticsDir = path.join(tmpDir, 'analytics');
|
||||
|
||||
Reference in New Issue
Block a user