From 421460f03ac9f917a3bf02934e5be30ff52d2e25 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 9 Jun 2026 21:02:30 -0700 Subject: [PATCH 1/2] v1.57.8.0 feat: browse js/eval --out render-to-file (canonical Chromium for offline rendering) (#1929) * feat(browse): js/eval --out render-to-file with write-capability gate Add --out / --raw to js and eval so an evaluate result is written straight to disk (base64 data URLs auto-decoded to bytes, charset-validated before decode, parent dirs created) instead of serialized back through the CLI. --out is modeled as a per-invocation WRITE: it requires write scope, is never dispatchable over the pair-agent tunnel (canDispatchOverTunnel now consults args), and counts as a mutation for watch-mode and tab-ownership. Shared parseOutArgs/hasOutArg/resultToString helpers keep the handler and the gate in sync. Tests cover the parser, render-to-file paths, and tunnel guards. * docs(browse): offline render mode + canonical-Chromium guidance Document the blessed offline-render path (headless, no proxy/Xvfb): visual output via screenshot --selector, bytes a function returns via js --out. Add the puppeteer->browse cheatsheet row, a "don't bundle your own Chromium" note (browse skill + CONTRIBUTING), and the --out/--raw command descriptions. Regenerate browse/SKILL.md, SKILL.md, and gstack/llms.txt from the templates. * chore: bump version and changelog (v1.59.1.0) Co-Authored-By: Claude Opus 4.8 (1M context) * docs: document js/eval --out render-to-file in BROWSER.md reference (v1.59.1.0) The js and eval reference rows in BROWSER.md drifted: every other reference surface (SKILL.md, gstack/llms.txt, browse/SKILL.md) already shows the new [--out ] [--raw] flags from v1.59.1.0, but the complete browser reference still showed the pre-feature signatures. Add the flags plus the WRITE-capability / no-tunnel note so the reference matches what shipped. Co-Authored-By: Claude Opus 4.8 (1M context) * chore: re-version 1.59.1.0 -> 1.57.8.0 (natural PATCH from 1.57.7.0) Co-Authored-By: Claude Opus 4.8 (1M context) --------- Co-authored-by: Claude Opus 4.8 (1M context) --- BROWSER.md | 4 +- CHANGELOG.md | 67 ++++++++++++ CONTRIBUTING.md | 8 ++ SKILL.md | 4 +- VERSION | 2 +- browse/SKILL.md | 58 +++++++++- browse/SKILL.md.tmpl | 54 +++++++++ browse/src/commands.ts | 4 +- browse/src/read-commands.ts | 137 +++++++++++++++++++++-- browse/src/server.ts | 47 ++++++-- browse/test/commands.test.ts | 157 ++++++++++++++++++++++++++- browse/test/tunnel-gate-unit.test.ts | 32 ++++++ gstack/llms.txt | 4 +- package.json | 2 +- 14 files changed, 553 insertions(+), 27 deletions(-) diff --git a/BROWSER.md b/BROWSER.md index 2c57f1d6e..eb69e8869 100644 --- a/BROWSER.md +++ b/BROWSER.md @@ -212,8 +212,8 @@ from `snapshot`, or `@c` refs from `snapshot -C`. Full table: | Command | Description | |---------|-------------| -| `js ` | Run inline JavaScript expression in page context, return as string | -| `eval ` | Run JS from a file (path under /tmp or cwd; same sandbox as `js`) | +| `js [--out ] [--raw]` | Run inline JavaScript expression in page context, return as string. With `--out ` the result is written to disk instead of returned (a `data:*;base64,...` result is decoded to raw bytes unless `--raw`). `--out` makes the invocation a WRITE (needs `write` scope, never allowed over the tunnel). | +| `eval [--out ] [--raw]` | Run JS from a file (path under /tmp or cwd; same sandbox as `js`). `--out`/`--raw` behave as for `js`. | | `css ` | Computed CSS value | | `attrs ` | Element attributes as JSON | | `is ` | State check: visible, hidden, enabled, disabled, checked, editable, focused | diff --git a/CHANGELOG.md b/CHANGELOG.md index 967255d61..438536cad 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,72 @@ # Changelog +## [1.57.8.0] - 2026-06-09 + +## **`browse` is now the one Chromium on the box, for offline rendering too.** +## **`js`/`eval --out ` writes a render straight to disk, so skills stop bundling their own puppeteer.** + +You can now turn your own local HTML or JSON into a PNG (or any bytes) on disk +through the same headless `browse` Chromium you already run, with no second +browser install. `js "" --out out.png` and `eval script.js --out out.png` +write the evaluate result to a file instead of returning it. When the result is a +base64 data URL (the shape Excalidraw exports, og-image generators, and card +renderers hand back), `--out` decodes it to raw bytes for you; pass `--raw` to +write the literal string. Malformed base64 errors loudly instead of writing a +corrupt file, and missing parent directories are created. This closes the gap that +made local-render skills each `npm i puppeteer` and download a drifting second +Chromium. + +### The numbers that matter + +No synthetic benchmark — these are structural facts of the change, verifiable from +the diff and a one-line smoke (`browse load-html` → `screenshot --selector` / +`js --out`). + +| For a skill that rasterizes local HTML/JSON | Before | After | +|---|---|---| +| Chromium installs per box | 2+ (browse + each skill's own puppeteer) | 1 (shared `browse`) | +| Getting a PNG from a render function | `evaluate` → multi-MB data URL over the CLI channel → hand-decode base64 → write | `js --out` decodes and writes server-side; only a short status crosses the channel | +| Render-to-file primitive | none | `js`/`eval --out [--raw]` | + +The blessed offline path is documented in the browse skill: visual output goes +through `screenshot --selector` (the picture never crosses the CDP wire), and bytes +a function returns go through `js --out`. + +### What this means for you + +If you write skills that draw diagrams, cards, or og-images, point them at `browse` +and delete the bundled Chromium. One version to pin, one daemon to manage. `--out` +is treated as a write everywhere it matters: it needs the `write` scope, is blocked +over the pair-agent tunnel, and is gated in watch mode, so a remote agent can never +use it to write to your disk. + +### Itemized changes + +#### Added +- **`js` / `eval --out ` render-to-file** (`browse/src/read-commands.ts`). + Writes the evaluate result to disk and returns a short `... result written: + ( bytes)` status. A `data:;base64,...` result is decoded to raw bytes + (case-insensitive header parse, split on the first comma, base64-charset validated + before decode); `--raw` forces a literal write. Parent directories are created. +- **`--raw` flag** to bypass data-URL decoding and write the literal result string. +- **Offline render mode docs** in the browse skill: an explicit headless, no-proxy, + no-Xvfb path with a worked example showing visual (`screenshot --selector`) vs + bytes (`js --out`), a puppeteer→browse cheatsheet row, and a "don't bundle your + own Chromium" note (also in CONTRIBUTING.md). + +#### Changed +- **`--out` is a per-invocation WRITE capability** (`browse/src/server.ts`). + `js`/`eval` stay read commands, but an `--out` invocation requires the `write` + scope, is never dispatchable over the tunnel surface (`canDispatchOverTunnel` now + consults args), and counts as a mutation for watch-mode and tab-ownership gates. + +#### For contributors +- New tests: `parseOutArgs`/`hasOutArg` unit coverage (`--out`/`--out=`, `--raw`, + repeats, missing value, ordering), `--out` render-to-file integration (large + string, data-URL→PNG, `--raw`, malformed-base64, outside-safe-dir, mkdir, eval + parity, byte-for-byte null/undefined), and tunnel-gate guards proving `--out` + is never tunnel-dispatchable. + ## [1.57.7.0] - 2026-06-08 ## **Every plan review now ends by telling you, in one line, whether anything is still unresolved.** diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e67a307d1..a4872fc47 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -232,6 +232,14 @@ For template authoring best practices (natural language over bash-isms, dynamic To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild. +**Don't bundle puppeteer/Chromium in a skill.** `browse` is the one shared +Chromium per box, including offline local-render workloads. A skill that needs to +rasterize its own HTML/JSON (diagrams, cards, og-images) should route through +`browse` — `screenshot --selector` for visual output, `load-html` + `js --out` for +bytes a render function returns — instead of `npm i puppeteer` and downloading a +second Chromium that drifts out of version sync. One install to pin, one daemon to +manage. + ## Jargon list (V1 writing style) gstack's Writing Style section (injected into every tier-≥2 skill's preamble) diff --git a/SKILL.md b/SKILL.md index 0b06b802b..8711ae7f3 100644 --- a/SKILL.md +++ b/SKILL.md @@ -917,10 +917,10 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | `cookies` | All cookies as JSON | | `css ` | Computed CSS value | | `dialog [--clear]` | Dialog messages | -| `eval ` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. | +| `eval [--out ] [--raw]` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `inspect [selector] [--all] [--history]` | Deep CSS inspection via CDP — full rule cascade, box model, computed styles | | `is ` | State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected. | -| `js ` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. | +| `js [--out ] [--raw]` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage | storage set ` | Read both localStorage and sessionStorage as JSON. With "set ", write to localStorage only (sessionStorage is read-only via this command — set it with `js sessionStorage.setItem(...)`). | diff --git a/VERSION b/VERSION index bb68a65d9..caf2638d9 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.57.7.0 +1.57.8.0 diff --git a/browse/SKILL.md b/browse/SKILL.md index e36fc9c86..0f670d47a 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -644,6 +644,51 @@ $B screenshot /tmp/out.png --selector .tweet-card ``` Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. +### 14. Offline render mode (rasterize your own HTML/JSON, zero network) + +This is the blessed path for "I just want to turn my own local HTML or JSON into a +PNG/PDF/bytes on disk" — Excalidraw diagrams, tweet/quote cards, og-images, +report rasterization. It is **plain headless, shared Chromium, no proxy, no Xvfb, +no anti-bot stealth**. Default `$B` is already exactly this; you do not pass +`--headed` or `--proxy`. One Chromium per box, shared by every skill — **do not +`npm i puppeteer` and ship a second browser** (see the note under the cheatsheet). + +Two output shapes, pick by what you have: + +**A) Visual output → `screenshot --selector` (preferred).** If the thing you want +is a picture of something on the page, screenshot it. The PNG is written from the +browser process straight to disk — the image bytes never cross the CDP wire. + +```bash +echo '
hi
' > /tmp/card.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/card.html +$B screenshot /tmp/card.png --selector '#card' # disk path — no megabytes over CDP +``` +(Use the disk path, NOT `screenshot --base64` — base64 serializes the bytes back +through the command channel, which is the cost you're trying to avoid.) + +**B) Bytes a function returns → `js --out` / `eval --out`.** When a library hands +you the result as a return value (a base64 data URL, a blob, computed JSON) rather +than painting a stable element — e.g. Excalidraw's export function returns a PNG +data URL — write the evaluate result straight to disk. `--out` decodes a +`data:*;base64,...` result to raw bytes automatically (pass `--raw` to write the +literal string). The payload is written by the daemon and never serialized back +out to the CLI/stdout. + +```bash +# Load the render bundle, signal readiness, then render-to-file. +$B load-html /tmp/excalidraw-export.html # bundle sets window.__render + a #done flag +$B wait '#done' # deterministic ready handshake +$B js "window.__render(SCENE_JSON)" --out /tmp/diagram.png # data URL → decoded PNG on disk +``` + +`--out` is a WRITE: it needs the `write` scope and is never allowed over the +pair-agent tunnel (a remote agent can't write to your disk). Parent directories +are created; malformed base64 errors instead of writing corrupt bytes. Pick A when +you can (no CDP transfer at all); reach for B only when the bytes come back as a +return value. + ## Puppeteer → browse cheatsheet Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: @@ -657,6 +702,8 @@ Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: | `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | | `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | | `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | +| `const r = await page.evaluate(fn)` | `$B js ""` (result to stdout) | +| `fs.writeFileSync(out, Buffer.from(dataUrl.split(',')[1],'base64'))` | `$B js "" --out ` (data URL auto-decoded) | Worked example (the tweet-renderer flow — Puppeteer → browse): @@ -671,6 +718,13 @@ $B screenshot /tmp/out.png --selector .tweet-card Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. +**Don't bundle your own puppeteer/Chromium.** `browse` is the one shared Chromium +per box. Skills that need to rasterize local HTML/JSON (diagrams, cards, og-images) +should route through `browse` — `screenshot --selector` for visual output, +`load-html` + `js --out` for bytes a function returns — instead of +`npm i puppeteer` and downloading a second Chromium that drifts out of version sync. +One install to pin, one daemon's lifecycle to manage. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor @@ -875,10 +929,10 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | `cookies` | All cookies as JSON | | `css ` | Computed CSS value | | `dialog [--clear]` | Dialog messages | -| `eval ` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. | +| `eval [--out ] [--raw]` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `inspect [selector] [--all] [--history]` | Deep CSS inspection via CDP — full rule cascade, box model, computed styles | | `is ` | State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected. | -| `js ` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. | +| `js [--out ] [--raw]` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage | storage set ` | Read both localStorage and sessionStorage as JSON. With "set ", write to localStorage only (sessionStorage is read-only via this command — set it with `js sessionStorage.setItem(...)`). | diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index a466fc446..9a159e4c9 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -135,6 +135,51 @@ $B screenshot /tmp/out.png --selector .tweet-card ``` Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. +### 14. Offline render mode (rasterize your own HTML/JSON, zero network) + +This is the blessed path for "I just want to turn my own local HTML or JSON into a +PNG/PDF/bytes on disk" — Excalidraw diagrams, tweet/quote cards, og-images, +report rasterization. It is **plain headless, shared Chromium, no proxy, no Xvfb, +no anti-bot stealth**. Default `$B` is already exactly this; you do not pass +`--headed` or `--proxy`. One Chromium per box, shared by every skill — **do not +`npm i puppeteer` and ship a second browser** (see the note under the cheatsheet). + +Two output shapes, pick by what you have: + +**A) Visual output → `screenshot --selector` (preferred).** If the thing you want +is a picture of something on the page, screenshot it. The PNG is written from the +browser process straight to disk — the image bytes never cross the CDP wire. + +```bash +echo '
hi
' > /tmp/card.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/card.html +$B screenshot /tmp/card.png --selector '#card' # disk path — no megabytes over CDP +``` +(Use the disk path, NOT `screenshot --base64` — base64 serializes the bytes back +through the command channel, which is the cost you're trying to avoid.) + +**B) Bytes a function returns → `js --out` / `eval --out`.** When a library hands +you the result as a return value (a base64 data URL, a blob, computed JSON) rather +than painting a stable element — e.g. Excalidraw's export function returns a PNG +data URL — write the evaluate result straight to disk. `--out` decodes a +`data:*;base64,...` result to raw bytes automatically (pass `--raw` to write the +literal string). The payload is written by the daemon and never serialized back +out to the CLI/stdout. + +```bash +# Load the render bundle, signal readiness, then render-to-file. +$B load-html /tmp/excalidraw-export.html # bundle sets window.__render + a #done flag +$B wait '#done' # deterministic ready handshake +$B js "window.__render(SCENE_JSON)" --out /tmp/diagram.png # data URL → decoded PNG on disk +``` + +`--out` is a WRITE: it needs the `write` scope and is never allowed over the +pair-agent tunnel (a remote agent can't write to your disk). Parent directories +are created; malformed base64 errors instead of writing corrupt bytes. Pick A when +you can (no CDP transfer at all); reach for B only when the bytes come back as a +return value. + ## Puppeteer → browse cheatsheet Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: @@ -148,6 +193,8 @@ Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: | `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | | `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | | `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | +| `const r = await page.evaluate(fn)` | `$B js ""` (result to stdout) | +| `fs.writeFileSync(out, Buffer.from(dataUrl.split(',')[1],'base64'))` | `$B js "" --out ` (data URL auto-decoded) | Worked example (the tweet-renderer flow — Puppeteer → browse): @@ -162,6 +209,13 @@ $B screenshot /tmp/out.png --selector .tweet-card Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. +**Don't bundle your own puppeteer/Chromium.** `browse` is the one shared Chromium +per box. Skills that need to rasterize local HTML/JSON (diagrams, cards, og-images) +should route through `browse` — `screenshot --selector` for visual output, +`load-html` + `js --out` for bytes a function returns — instead of +`npm i puppeteer` and downloading a second Chromium that drifts out of version sync. +One install to pin, one daemon's lifecycle to manage. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor diff --git a/browse/src/commands.ts b/browse/src/commands.ts index 7e647a002..73bc9ab1b 100644 --- a/browse/src/commands.ts +++ b/browse/src/commands.ts @@ -106,8 +106,8 @@ export const COMMAND_DESCRIPTIONS: Record' }, - 'eval': { category: 'Inspection', description: 'Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners.', usage: 'eval ' }, + 'js': { category: 'Inspection', description: 'Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel).', usage: 'js [--out ] [--raw]' }, + 'eval': { category: 'Inspection', description: 'Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel).', usage: 'eval [--out ] [--raw]' }, 'css': { category: 'Inspection', description: 'Computed CSS value', usage: 'css ' }, 'attrs': { category: 'Inspection', description: 'Element attributes as JSON', usage: 'attrs ' }, 'is': { category: 'Inspection', description: 'State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected.', usage: 'is ' }, diff --git a/browse/src/read-commands.ts b/browse/src/read-commands.ts index 486eac18a..4e1371a17 100644 --- a/browse/src/read-commands.ts +++ b/browse/src/read-commands.ts @@ -13,7 +13,7 @@ import * as fs from 'fs'; import * as path from 'path'; import { TEMP_DIR } from './platform'; import { inspectElement, formatInspectorResult, getModificationHistory } from './cdp-inspector'; -import { validateReadPath } from './path-security'; +import { validateReadPath, validateOutputPath } from './path-security'; import { stripLoneSurrogates } from './sanitize'; // Re-export for backward compatibility (tests import from read-commands) export { validateReadPath } from './path-security'; @@ -46,6 +46,117 @@ function wrapForEvaluate(code: string): string { : `(async()=>(${trimmed}))()`; } +/** Flags split out of `js`/`eval` args by parseOutArgs. */ +export interface OutArgs { + outPath?: string; + raw: boolean; + rest: string[]; +} + +/** + * Parse `--out ` / `--out=` and `--raw` / `--raw=true|false` out of an + * arg list, returning the flags plus the remaining positional args (`rest`). + * + * Single source of truth shared by the js/eval handlers and the write-capability + * gate in server.ts, so the two never disagree on what counts as an `--out` + * invocation. Throws on malformed usage (repeated `--out`, missing value, bad + * `--raw` value) so the user gets a clear error instead of a silent misparse. + */ +export function parseOutArgs(args: string[]): OutArgs { + let outPath: string | undefined; + let raw = false; + const rest: string[] = []; + for (let i = 0; i < args.length; i++) { + const a = args[i]; + if (a === '--out') { + if (outPath !== undefined) throw new Error('--out specified more than once'); + const val = args[i + 1]; + if (val === undefined || val.startsWith('--')) throw new Error('--out requires a file path'); + outPath = val; + i++; + } else if (a.startsWith('--out=')) { + if (outPath !== undefined) throw new Error('--out specified more than once'); + const val = a.slice('--out='.length); + if (val === '') throw new Error('--out requires a file path'); + outPath = val; + } else if (a === '--raw') { + raw = true; + } else if (a.startsWith('--raw=')) { + const v = a.slice('--raw='.length).toLowerCase(); + if (v !== 'true' && v !== 'false') throw new Error('--raw must be true or false'); + raw = v === 'true'; + } else { + rest.push(a); + } + } + return { outPath, raw, rest }; +} + +/** + * True iff an arg list contains an `--out` flag in any accepted form + * (`--out ` or `--out=`). Used by the write-capability gate to + * decide whether an otherwise-read command (`js`/`eval`) is actually a write + * invocation. Mirrors parseOutArgs's `--out` recognition exactly. Never throws — + * a malformed `--out=` still counts as an out attempt (fail safe: gate it). + */ +export function hasOutArg(args: string[]): boolean { + return args.some(a => a === '--out' || a.startsWith('--out=')); +} + +/** + * Convert an evaluate() result to its string form — the exact conversion `js`/`eval` + * used inline before `--out` existed. Kept byte-for-byte: `typeof === 'object'` + * (which includes `null`) goes through JSON.stringify (so `null` → `"null"`); + * everything else via `String(result ?? '')` (so `undefined` → `''`). JSON.stringify + * still throws on circular / BigInt-bearing results, same as before. + */ +export function resultToString(result: unknown): string { + return typeof result === 'object' + ? JSON.stringify(result, null, 2) + : String(result ?? ''); +} + +/** + * Write an evaluate result string to disk for `--out`, returning bytes written. + * + * When the result is a base64 data URL (`data:;...;base64,`) and + * `raw` is false, decode the payload to raw bytes — this is the Excalidraw / og-image + * path where a render function returns a PNG data URL. The header is parsed + * case-insensitively and split on the FIRST comma (data URLs can contain commas in + * the payload). The payload is validated against the base64 charset before decoding, + * because `Buffer.from(_, 'base64')` silently drops invalid characters and would + * otherwise write corrupted bytes. `--raw` forces a literal write even for data URLs. + * + * Non-base64 strings are surrogate-sanitized (matching what the stdout egress path + * did before) and written as UTF-8. Parent directories are created — validateOutputPath + * gates the location but does not mkdir. + */ +export function writeEvalResult(outPath: string, str: string, opts: { raw: boolean }): number { + validateOutputPath(outPath); + fs.mkdirSync(path.dirname(path.resolve(outPath)), { recursive: true }); + + if (!opts.raw && str.startsWith('data:')) { + const comma = str.indexOf(','); + if (comma !== -1) { + const header = str.slice('data:'.length, comma); + const tokens = header.split(';').map(t => t.trim().toLowerCase()); + if (tokens.includes('base64')) { + const payload = str.slice(comma + 1).replace(/\s+/g, ''); + if (!/^[A-Za-z0-9+/]*={0,2}$/.test(payload)) { + throw new Error('--out: malformed base64 in data URL (decode would corrupt output)'); + } + const buf = Buffer.from(payload, 'base64'); + fs.writeFileSync(outPath, buf); + return buf.length; + } + } + } + + const buf = Buffer.from(stripLoneSurrogates(str), 'utf-8'); + fs.writeFileSync(outPath, buf); + return buf.length; +} + /** * Extract clean text from a page (strips script/style/noscript/svg). * Exported for DRY reuse in meta-commands (diff). @@ -179,24 +290,36 @@ export async function handleReadCommand( } case 'js': { - const expr = args[0]; - if (!expr) throw new Error('Usage: browse js '); + const { outPath, raw, rest } = parseOutArgs(args); + const expr = rest[0]; + if (!expr) throw new Error('Usage: browse js [--out ] [--raw]'); if (bm) assertJsOriginAllowed(bm, page.url()); const wrapped = wrapForEvaluate(expr); const result = await target.evaluate(wrapped); - return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); + const str = resultToString(result); + if (outPath) { + const n = writeEvalResult(outPath, str, { raw }); + return `JS result written: ${outPath} (${n} bytes)`; + } + return str; } case 'eval': { - const filePath = args[0]; - if (!filePath) throw new Error('Usage: browse eval '); + const { outPath, raw, rest } = parseOutArgs(args); + const filePath = rest[0]; + if (!filePath) throw new Error('Usage: browse eval [--out ] [--raw]'); if (bm) assertJsOriginAllowed(bm, page.url()); validateReadPath(filePath); if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`); const code = fs.readFileSync(filePath, 'utf-8'); const wrapped = wrapForEvaluate(code); const result = await target.evaluate(wrapped); - return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? ''); + const str = resultToString(result); + if (outPath) { + const n = writeEvalResult(outPath, str, { raw }); + return `Eval result written: ${outPath} (${n} bytes)`; + } + return str; } case 'css': { diff --git a/browse/src/server.ts b/browse/src/server.ts index 6f75551ff..301781acc 100644 --- a/browse/src/server.ts +++ b/browse/src/server.ts @@ -14,7 +14,7 @@ */ import { BrowserManager } from './browser-manager'; -import { handleReadCommand } from './read-commands'; +import { handleReadCommand, hasOutArg } from './read-commands'; import { handleWriteCommand } from './write-commands'; import { handleMetaCommand } from './meta-commands'; import { handleCookiePickerRoute, hasActivePicker } from './cookie-picker-routes'; @@ -330,9 +330,15 @@ export const TUNNEL_COMMANDS = new Set([ * without standing up an HTTP listener. Behavior is identical to the inline * check; the function canonicalizes the command (so aliases hit the same set) * and returns false for null/undefined input. + * + * `args` is consulted so an `--out` invocation (e.g. `eval --out `) is + * NEVER tunnel-dispatchable: `--out` turns an otherwise-readable command into a + * local-disk WRITE, and the tunnel surface never grants disk-write capability to + * remote paired agents. Omitting `args` preserves the old command-only behavior. */ -export function canDispatchOverTunnel(command: string | undefined | null): boolean { +export function canDispatchOverTunnel(command: string | undefined | null, args?: string[]): boolean { if (typeof command !== 'string' || command.length === 0) return false; + if (Array.isArray(args) && hasOutArg(args)) return false; const cmd = canonicalizeCommand(command); return TUNNEL_COMMANDS.has(cmd); } @@ -716,6 +722,19 @@ if (BROWSE_PARENT_PID > 0 && !IS_HEADED_WATCHDOG) { import { READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS } from './commands'; export { READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS }; +/** + * Whether an invocation should be treated as a WRITE for capability gating + * (scope, watch-mode block, tab ownership, tunnel). A command is a write if it + * mutates state (`WRITE_COMMANDS`) OR it carries an `--out` flag — `js`/`eval + * --out` writes the evaluate result to local disk, so the capability is + * per-invocation, not per-command-name. This deliberately does NOT change + * dispatch routing: `js`/`eval` still route to `handleReadCommand`; only the + * security gates consult this. + */ +function isWriteInvocation(command: string, args: string[]): boolean { + return WRITE_COMMANDS.has(command) || hasOutArg(args); +} + // ─── Inspector State (in-memory) ────────────────────────────── let inspectorData: InspectorResult | null = null; let inspectorTimestamp: number = 0; @@ -957,6 +976,19 @@ async function handleCommandInternalImpl( }; } + // `--out` writes the evaluate result to local disk, which is a WRITE + // capability distinct from the JS-exec (admin) capability js/eval need. + // Require write scope so an admin-but-not-write token can't write files. + if (hasOutArg(args) && !tokenInfo.scopes.includes('write')) { + return { + status: 403, json: true, + result: JSON.stringify({ + error: `"--out" writes to disk and requires the "write" scope`, + hint: `Your scopes: ${tokenInfo.scopes.join(', ')}. Re-pair with write access to use --out.`, + }), + }; + } + // Domain check for navigation commands if ((command === 'goto' || command === 'newtab') && args[0]) { if (!checkDomain(tokenInfo, args[0])) { @@ -1011,7 +1043,7 @@ async function handleCommandInternalImpl( // Skip for `newtab` — it creates a tab rather than accessing one. if (command !== 'newtab' && tokenInfo && tokenInfo.clientId !== 'root' && tokenInfo.tabPolicy === 'own-only') { const targetTab = tabId ?? browserManager.getActiveTabId(); - if (!browserManager.checkTabAccess(targetTab, tokenInfo.clientId, { isWrite: WRITE_COMMANDS.has(command), ownOnly: true })) { + if (!browserManager.checkTabAccess(targetTab, tokenInfo.clientId, { isWrite: isWriteInvocation(command, args), ownOnly: true })) { return { status: 403, json: true, result: JSON.stringify({ @@ -1035,8 +1067,9 @@ async function handleCommandInternalImpl( }; } - // Block mutation commands while watching (read-only observation mode) - if (browserManager.isWatching() && WRITE_COMMANDS.has(command)) { + // Block mutation commands while watching (read-only observation mode). + // `--out` invocations count as mutations (they write the result to disk). + if (browserManager.isWatching() && isWriteInvocation(command, args)) { return { status: 400, json: true, result: JSON.stringify({ error: 'Cannot run mutation commands while watching. Run `$B watch stop` first.' }), @@ -2650,11 +2683,11 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle { // Paired remote agents drive the browser but cannot configure the // daemon, launch new browsers, import cookies, or rotate tokens. if (surface === 'tunnel') { - if (!canDispatchOverTunnel(body?.command)) { + if (!canDispatchOverTunnel(body?.command, body?.args)) { logTunnelDenial(req, url, `disallowed_command:${body?.command}`); return new Response(JSON.stringify({ error: `Command '${body?.command}' is not allowed over the tunnel surface`, - hint: `Tunnel commands: ${[...TUNNEL_COMMANDS].sort().join(', ')}`, + hint: `Tunnel commands: ${[...TUNNEL_COMMANDS].sort().join(', ')}. Note: --out (disk write) is never allowed over the tunnel.`, }), { status: 403, headers: { 'Content-Type': 'application/json' } }); } } diff --git a/browse/test/commands.test.ts b/browse/test/commands.test.ts index b3870c0cc..9382cb27e 100644 --- a/browse/test/commands.test.ts +++ b/browse/test/commands.test.ts @@ -9,7 +9,7 @@ import { describe, test, expect, beforeAll, afterAll } from 'bun:test'; import { startTestServer } from './test-server'; import { BrowserManager } from '../src/browser-manager'; import { resolveServerScript } from '../src/cli'; -import { handleReadCommand as _handleReadCommand } from '../src/read-commands'; +import { handleReadCommand as _handleReadCommand, parseOutArgs, hasOutArg, resultToString } from '../src/read-commands'; import { handleWriteCommand as _handleWriteCommand } from '../src/write-commands'; import { handleMetaCommand } from '../src/meta-commands'; import { consoleBuffer, networkBuffer, dialogBuffer, addConsoleEntry, addNetworkEntry, addDialogEntry, CircularBuffer } from '../src/buffers'; @@ -23,6 +23,65 @@ const handleReadCommand = (cmd: string, args: string[], b: BrowserManager) => const handleWriteCommand = (cmd: string, args: string[], b: BrowserManager) => _handleWriteCommand(cmd, args, b.getActiveSession(), b); +// ─── Pure arg-parser + result-conversion unit tests (no browser) ─── +describe('parseOutArgs / hasOutArg', () => { + test('--out splits the flag from the positional', () => { + expect(parseOutArgs(['expr', '--out', '/tmp/x'])).toEqual({ outPath: '/tmp/x', raw: false, rest: ['expr'] }); + }); + + test('--out= form is equivalent', () => { + expect(parseOutArgs(['expr', '--out=/tmp/x'])).toEqual({ outPath: '/tmp/x', raw: false, rest: ['expr'] }); + }); + + test('flag ordering does not matter', () => { + expect(parseOutArgs(['--out', '/tmp/x', 'expr'])).toEqual({ outPath: '/tmp/x', raw: false, rest: ['expr'] }); + }); + + test('--raw and --raw=true|false', () => { + expect(parseOutArgs(['e', '--out', '/tmp/x', '--raw']).raw).toBe(true); + expect(parseOutArgs(['e', '--out', '/tmp/x', '--raw=true']).raw).toBe(true); + expect(parseOutArgs(['e', '--out', '/tmp/x', '--raw=false']).raw).toBe(false); + }); + + test('repeated --out throws', () => { + expect(() => parseOutArgs(['e', '--out', '/a', '--out', '/b'])).toThrow(/more than once/); + }); + + test('--out with a missing value throws', () => { + expect(() => parseOutArgs(['e', '--out'])).toThrow(/requires a file path/); + expect(() => parseOutArgs(['e', '--out', '--raw'])).toThrow(/requires a file path/); + expect(() => parseOutArgs(['e', '--out='])).toThrow(/requires a file path/); + }); + + test('bad --raw value throws', () => { + expect(() => parseOutArgs(['e', '--out', '/a', '--raw=maybe'])).toThrow(/--raw must be true or false/); + }); + + test('hasOutArg matches --out and --out= exactly, not lookalikes', () => { + expect(hasOutArg(['a', '--out', 'b'])).toBe(true); + expect(hasOutArg(['a', '--out=b'])).toBe(true); + expect(hasOutArg(['a'])).toBe(false); + expect(hasOutArg(['a', '--output', 'b'])).toBe(false); + expect(hasOutArg(['a', '--outx'])).toBe(false); + }); +}); + +describe('resultToString — byte-for-byte with pre-refactor behavior', () => { + test('null becomes "null" (typeof null === object → JSON.stringify)', () => { + expect(resultToString(null)).toBe('null'); + }); + test('undefined becomes empty string', () => { + expect(resultToString(undefined)).toBe(''); + }); + test('objects are pretty-printed JSON', () => { + expect(resultToString({ a: 1 })).toBe(JSON.stringify({ a: 1 }, null, 2)); + }); + test('primitives use String()', () => { + expect(resultToString(42)).toBe('42'); + expect(resultToString(true)).toBe('true'); + }); +}); + let testServer: ReturnType; let bm: BrowserManager; let baseUrl: string; @@ -225,6 +284,102 @@ describe('Inspection', () => { expect(result).toBe('3'); }); + // ─── js/eval --out (render-to-file) ─────────────────────────── + + test('js (no --out) returns a multi-MB string without truncation', async () => { + // Handler-level guarantee: the result is not sliced/capped before return. + // (Full HTTP egress path is exercised elsewhere; this pins the handler.) + const result = await handleReadCommand('js', ["'x'.repeat(3 * 1024 * 1024)"], bm); + expect(result.length).toBe(3 * 1024 * 1024); + }); + + test('js --out writes the result to disk and returns a short status, not the payload', async () => { + const out = `/tmp/browse-out-large-${Date.now()}.txt`; + try { + const result = await handleReadCommand('js', ["'y'.repeat(2 * 1024 * 1024)", '--out', out], bm); + expect(result).toContain('JS result written:'); + expect(result).toContain(out); + expect(result).toContain(`(${2 * 1024 * 1024} bytes)`); + expect(result.length).toBeLessThan(200); // status, not the 2MB payload + expect(fs.statSync(out).size).toBe(2 * 1024 * 1024); + } finally { + fs.rmSync(out, { force: true }); + } + }); + + test('js --out decodes a base64 PNG data URL to real bytes', async () => { + // 1x1 transparent PNG. + const b64 = 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=='; + const out = `/tmp/browse-out-png-${Date.now()}.png`; + try { + const result = await handleReadCommand('js', [`'data:image/png;base64,' + '${b64}'`, '--out', out], bm); + const buf = fs.readFileSync(out); + // PNG magic bytes: 89 50 4E 47 + expect([buf[0], buf[1], buf[2], buf[3]]).toEqual([0x89, 0x50, 0x4e, 0x47]); + const expectedLen = Buffer.from(b64, 'base64').length; + expect(buf.length).toBe(expectedLen); + expect(result).toContain(`(${expectedLen} bytes)`); + } finally { + fs.rmSync(out, { force: true }); + } + }); + + test('js --out --raw writes the literal data-URL string (no decode)', async () => { + const dataUrl = 'data:text/plain;base64,aGVsbG8='; + const out = `/tmp/browse-out-raw-${Date.now()}.txt`; + try { + await handleReadCommand('js', [`'${dataUrl}'`, '--out', out, '--raw'], bm); + expect(fs.readFileSync(out, 'utf-8')).toBe(dataUrl); + } finally { + fs.rmSync(out, { force: true }); + } + }); + + test('js --out throws on a malformed base64 data URL instead of writing corrupt bytes', async () => { + const out = `/tmp/browse-out-bad-${Date.now()}.png`; + try { + await expect( + handleReadCommand('js', ["'data:image/png;base64,!!!not-base64!!!'", '--out', out], bm) + ).rejects.toThrow(/malformed base64/); + expect(fs.existsSync(out)).toBe(false); + } finally { + fs.rmSync(out, { force: true }); + } + }); + + test('js --out rejects a path outside the safe directories', async () => { + await expect( + handleReadCommand('js', ['1 + 1', '--out', '/etc/browse-should-not-write.txt'], bm) + ).rejects.toThrow(); + }); + + test('js --out creates a missing parent directory', async () => { + // validateOutputPath resolves the parent's realpath, so it permits one level + // of missing dir under a safe root (/tmp). mkdir then materializes it. + const root = `/tmp/browse-out-nested-${Date.now()}`; + const out = `${root}/result.txt`; + try { + await handleReadCommand('js', ["'nested'", '--out', out], bm); + expect(fs.readFileSync(out, 'utf-8')).toBe('nested'); + } finally { + fs.rmSync(root, { recursive: true, force: true }); + } + }); + + test('eval --out writes the file result to disk (parity with js)', async () => { + const script = `/tmp/browse-eval-out-src-${Date.now()}.js`; + const out = `/tmp/browse-eval-out-${Date.now()}.txt`; + fs.writeFileSync(script, "'from eval'"); + try { + const result = await handleReadCommand('eval', [script, '--out', out], bm); + expect(result).toContain('Eval result written:'); + expect(fs.readFileSync(out, 'utf-8')).toBe('from eval'); + } finally { + fs.rmSync(script, { force: true }); + fs.rmSync(out, { force: true }); + } + }); + test('css returns computed property', async () => { const result = await handleReadCommand('css', ['h1', 'color'], bm); // Navy color diff --git a/browse/test/tunnel-gate-unit.test.ts b/browse/test/tunnel-gate-unit.test.ts index f6d61c13a..6fcdd9e51 100644 --- a/browse/test/tunnel-gate-unit.test.ts +++ b/browse/test/tunnel-gate-unit.test.ts @@ -95,3 +95,35 @@ describe('canDispatchOverTunnel — alias canonicalization', () => { expect(canDispatchOverTunnel('closetab')).toBe(true); }); }); + +describe('canDispatchOverTunnel — --out writes are never tunnel-dispatchable', () => { + // `--out` turns an otherwise-readable command into a local-disk WRITE. The + // tunnel surface never grants disk-write to remote paired agents, so any + // --out invocation must be 403'd even when the bare command is allowlisted. + test('bare eval dispatches, but eval --out does not', () => { + expect(canDispatchOverTunnel('eval', ['/tmp/x.js'])).toBe(true); + expect(canDispatchOverTunnel('eval', ['/tmp/x.js', '--out', '/tmp/o.png'])).toBe(false); + }); + + test('--out= form is rejected too (no parser-shape bypass)', () => { + expect(canDispatchOverTunnel('eval', ['/tmp/x.js', '--out=/tmp/o.png'])).toBe(false); + }); + + test('--out anywhere in args is caught regardless of ordering', () => { + expect(canDispatchOverTunnel('eval', ['--out', '/tmp/o.png', '/tmp/x.js'])).toBe(false); + }); + + test('args without --out still dispatch', () => { + expect(canDispatchOverTunnel('goto', ['https://example.com'])).toBe(true); + expect(canDispatchOverTunnel('eval', ['/tmp/x.js'])).toBe(true); + }); + + test('omitting args preserves the old command-only behavior', () => { + expect(canDispatchOverTunnel('eval')).toBe(true); + }); + + test('a lookalike flag (--output) is NOT treated as --out', () => { + // hasOutArg matches '--out' exactly or '--out='; '--output' must not trip it. + expect(canDispatchOverTunnel('eval', ['/tmp/x.js', '--output', '/tmp/o'])).toBe(true); + }); +}); diff --git a/gstack/llms.txt b/gstack/llms.txt index a11b045d1..9f9f717ec 100644 --- a/gstack/llms.txt +++ b/gstack/llms.txt @@ -81,10 +81,10 @@ Run with `browse [args]`. Full reference: `browse/SKILL.md`. - `cookies`: All cookies as JSON - `css `: Computed CSS value - `dialog [--clear]`: Dialog messages -- `eval `: Run JavaScript from a file in the page context and return result as string. +- `eval [--out ] [--raw]`: Run JavaScript from a file in the page context and return result as string. - `inspect [selector] [--all] [--history]`: Deep CSS inspection via CDP — full rule cascade, box model, computed styles - `is `: State check on element. -- `js `: Run inline JavaScript expression in the page context and return result as string. +- `js [--out ] [--raw]`: Run inline JavaScript expression in the page context and return result as string. - `network [--clear]`: Network requests - `perf`: Page load timings - `storage | storage set `: Read both localStorage and sessionStorage as JSON. diff --git a/package.json b/package.json index 229d7034c..789aa8db8 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.57.7.0", + "version": "1.57.8.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", From 8241949357263be64013a8410171def68cff920c Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Tue, 9 Jun 2026 22:29:23 -0700 Subject: [PATCH 2/2] v1.57.9.0 feat: source-clean gbrain render (dev-setup --out-dir + machine-wide gbrain-refresh) (#1951) * feat(gbrain-detect): add --is-ok live-detection exit-code gate Single source of truth for 'is gbrain usable'. Runs live detection (never reads the possibly-stale gbrain-detection.json) and exits 0 iff status is ok, so setup, bin/dev-setup, and gstack-config can gate brain-aware rendering on one shared check instead of re-grepping the JSON. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(gen-skill-docs): add --out-dir with surgical section-path rewrite --out-dir mirrors the Claude skill tree (SKILL.md + sections) into a separate directory instead of writing in place, and rewrites the literal section-base path (~/.claude/skills/gstack//sections/) in generated content to point at the out-dir. The rewrite is surgical: only /sections/ paths move; bin/, browse/, docs/ references stay pointed at the global install. Global extras (proactive-suggestions.json) are skipped in out-dir mode. Default (no flag) behavior is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(dev-setup): render gbrain :user variant to an untracked workspace dir Stops the dev/Conductor workspace from dirtying tracked SKILL.md source. setup honors GSTACK_SKIP_GBRAIN_REGEN (passed inline by dev-setup, never exported) and skips the in-place :user regen; detection is still persisted (PID-unique tmp so concurrent workspaces can't clobber it). dev-setup instead renders the :user variant into .claude/gstack-rendered (gitignored, per-workspace) and repoints the workspace SKILL.md symlinks at it, so the workspace gets brain-aware blocks while the worktree stays canonical. dev-teardown removes the render. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(dev-skill): refresh the untracked brain-aware render on template change After the default in-place regen (which keeps the worktree canonical and runs validation), also re-render the :user variant into .claude/gstack-rendered when it exists, so live template edits reflect at the workspace's runtime. Never creates the render dir during plain template dev. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(gstack-config): gbrain-refresh renders brain-aware blocks into the install Extends gbrain-refresh to render the :user variant into the global install (~/.claude/skills/gstack) so every project's Claude sessions get brain-aware blocks, not just the gstack dev workspace. Guarded against mutating the wrong directory: the target must exist, not be a symlink (a symlinked install points at a dev worktree), and look like a real gstack clone (VERSION + package.json). Idempotent and self-documenting. CLAUDE.md's deploy section now notes that 'git reset --hard' reverts the blocks and to re-run gbrain-refresh. Co-Authored-By: Claude Opus 4.8 (1M context) * test: cover gstack-gbrain-detect --is-ok + dev-skill render refresh Fills the two automated-coverage gaps from the eng review: --is-ok exit-code gate (no-cli -> nonzero, healthy -> 0, plus an agrees-with-JSON no-skew check reusing the deterministic fake-gbrain harness) and a static tripwire that dev-skill re-renders the :user variant into the workspace render dir only when it already exists. Co-Authored-By: Claude Opus 4.8 (1M context) * chore: bump version and changelog (v1.57.9.0) Co-Authored-By: Claude Opus 4.8 (1M context) * docs: document brain-aware dev-setup render for v1.57.9.0 Co-Authored-By: Claude Opus 4.8 (1M context) --------- Co-authored-by: Claude Opus 4.8 (1M context) --- .gitignore | 1 + CHANGELOG.md | 71 +++++++++++++++++ CLAUDE.md | 6 ++ CONTRIBUTING.md | 20 ++++- VERSION | 2 +- bin/dev-setup | 46 ++++++++++- bin/dev-teardown | 9 ++- bin/gstack-config | 25 +++++- bin/gstack-gbrain-detect | 10 +++ package.json | 2 +- scripts/dev-skill.ts | 18 +++++ scripts/gen-skill-docs.ts | 59 +++++++++++++- setup | 33 +++++--- test/dev-setup-render-isolation.test.ts | 91 ++++++++++++++++++++++ test/gbrain-detect-shape.test.ts | 75 +++++++++++++++++- test/gbrain-refresh-install-render.test.ts | 60 ++++++++++++++ test/gen-skill-docs-out-dir.test.ts | 84 ++++++++++++++++++++ 17 files changed, 590 insertions(+), 22 deletions(-) create mode 100644 test/dev-setup-render-isolation.test.ts create mode 100644 test/gbrain-refresh-install-render.test.ts create mode 100644 test/gen-skill-docs-out-dir.test.ts diff --git a/.gitignore b/.gitignore index 42b2c2a04..6eac08f36 100644 --- a/.gitignore +++ b/.gitignore @@ -7,6 +7,7 @@ make-pdf/dist/ bin/gstack-global-discover* .gstack/ .claude/skills/ +.claude/gstack-rendered/ .claude/scheduled_tasks.lock .claude/*.lock .agents/ diff --git a/CHANGELOG.md b/CHANGELOG.md index 438536cad..3f8cffae1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,76 @@ # Changelog +## [1.57.9.0] - 2026-06-09 + +## **Your gstack checkout stays clean when gbrain is installed.** +## **Brain-aware skill blocks render to an untracked spot, never into tracked source.** + +Before this, finishing a Conductor or dev-workspace setup with gbrain installed +rewrote 16 planning and review SKILL.md files in place, adding 326 lines of +brain-aware blocks straight into tracked source. Your working tree came back dirty, +one stray `git add` away from committing a token regression for everyone who does +not run gbrain. Now `gen-skill-docs --out-dir` renders the brain-aware variant into +an untracked per-workspace directory, and `bin/dev-setup` repoints the workspace's +skill symlinks at it. The dev workspace gets the full gbrain experience (context-load +and save-to-brain blocks live at runtime), while the tracked SKILL.md files stay +byte-for-byte canonical. To turn the blocks on across all your projects' Claude +sessions, `gstack-config gbrain-refresh` now renders them into your global install, +guarded so it never mutates a symlinked or non-gstack directory. + +### The numbers that matter + +Structural facts of the change, verifiable from the diff plus `bun run gen:skill-docs` +(zero drift) and the new behavioral test (`test/gen-skill-docs-out-dir.test.ts`). + +| When gbrain is installed | Before | After | +|---|---|---| +| Tracked SKILL.md files dirtied by dev-setup | 16 (+326 lines) | 0 | +| Where brain-aware blocks render in a dev workspace | in-place, tracked source | `.claude/gstack-rendered/`, untracked | +| Brain-aware blocks across other projects | re-run `./setup` or hand-edit | `gstack-config gbrain-refresh` (idempotent) | +| "Is gbrain usable" check | per-caller JSON grep, can read stale state | `gstack-gbrain-detect --is-ok` (one live gate) | + +The section-path rewrite is surgical: only `~/.claude/skills/gstack//sections/` +references move to the render dir, so `bin/` and `docs/` references still resolve to +the install. + +### What this means for you + +If you develop gstack with gbrain on, `git status` is clean again after setup, and +you can stop fishing brain-block drift out of your commits. After a +`git reset --hard` deploy of your install, re-run `gstack-config gbrain-refresh` to +restore the machine-wide blocks (it is idempotent, and the deploy note in CLAUDE.md +spells this out). + +### Itemized changes + +#### Added +- `gen-skill-docs --out-dir `: render the Claude SKILL.md + sections into a + separate directory instead of in place, rewriting only the section-base path so + section reads resolve to the render. Default (no flag) output is unchanged. +- `gstack-gbrain-detect --is-ok`: live-detection exit-code gate (0 iff gbrain is + usable), so setup, dev-setup, and gstack-config share one check. +- `gstack-config gbrain-refresh` now renders brain-aware blocks into the global + install (`~/.claude/skills/gstack`), guarded against symlinked or non-gstack + targets and self-documenting about the `reset --hard` re-run cycle. + +#### Changed +- `bin/dev-setup` renders the brain-aware variant into `.claude/gstack-rendered` + (gitignored) and repoints workspace skill symlinks at it; the worktree stays + canonical. `GSTACK_SKIP_GBRAIN_REGEN` is passed inline to the nested setup, never + exported. +- `setup` honors `GSTACK_SKIP_GBRAIN_REGEN` (skips the in-place brain regen on dev + trees) and writes detection state to a PID-unique tmp so concurrent workspaces + cannot clobber it. +- `scripts/dev-skill.ts` refreshes the workspace render on template change, only + when the render dir already exists. +- `bin/dev-teardown` removes the untracked render. + +#### For contributors +- New tests: `test/gen-skill-docs-out-dir.test.ts` (behavioral: worktree unchanged, + blocks rendered, section paths rewritten), `test/dev-setup-render-isolation.test.ts` + and `test/gbrain-refresh-install-render.test.ts` (static tripwires), plus + `--is-ok` coverage in `test/gbrain-detect-shape.test.ts`. + ## [1.57.8.0] - 2026-06-09 ## **`browse` is now the one Chromium on the box, for offline rendering too.** diff --git a/CLAUDE.md b/CLAUDE.md index 41db0093e..03384ae79 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -883,6 +883,12 @@ The active skill lives at `~/.claude/skills/gstack/`. After making changes: 2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main` 3. Rebuild: `cd ~/.claude/skills/gstack && bun run build` +**If you use gbrain:** the `git reset --hard` in step 2 reverts the brain-aware +(`GBRAIN_CONTEXT_LOAD` / `GBRAIN_SAVE_RESULTS`) blocks that `gstack-config +gbrain-refresh` renders into the install (those generated blocks differ from +`main` by design). After deploying, re-run `gstack-config gbrain-refresh` to +restore them across all your projects' Claude sessions. It's idempotent. + Or copy the binaries directly: - `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse` - `cp design/dist/design ~/.claude/skills/gstack/design/dist/design` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a4872fc47..5a56ef5d3 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -106,6 +106,22 @@ bun run build bin/dev-teardown ``` +### Brain-aware blocks in a dev workspace (gbrain installed) + +If gbrain is installed and usable (`bin/gstack-gbrain-detect --is-ok` exits 0), +`bin/dev-setup` keeps your tracked `SKILL.md` files canonical and renders the +brain-aware variant (the `GBRAIN_CONTEXT_LOAD` / `GBRAIN_SAVE_RESULTS` blocks) +into `.claude/gstack-rendered/` (gitignored, per-workspace). It then repoints the +workspace's `SKILL.md` symlinks at that render, so your Claude sessions get the +full gbrain experience while `git status` stays clean. Under the hood, dev-setup +passes `GSTACK_SKIP_GBRAIN_REGEN=1` inline to the nested `./setup` (so it never +dirties tracked source) and runs `gen:skill-docs:user --out-dir .claude/gstack-rendered`, +which rewrites only the section-base paths to point at the render. `bin/dev-teardown` +removes the render. To make the blocks live across your *other* projects' Claude +sessions, run `gstack-config gbrain-refresh`, which renders them into the global +install (`~/.claude/skills/gstack`), guarded so it never touches a symlinked or +non-gstack directory. + ## Testing & evals ### Setup @@ -334,8 +350,8 @@ If you're using [Conductor](https://conductor.build) to run multiple Claude Code | Hook | Script | What it does | |------|--------|-------------| -| `setup` | `bin/dev-setup` | Copies `.env` from main worktree, installs deps, symlinks skills, runs `./setup` non-interactively | -| `archive` | `bin/dev-teardown` | Removes skill symlinks, cleans up `.claude/` directory | +| `setup` | `bin/dev-setup` | Copies `.env` from main worktree, installs deps, symlinks skills, runs `./setup` non-interactively, and (if gbrain is installed) renders brain-aware blocks into `.claude/gstack-rendered/` without dirtying tracked source | +| `archive` | `bin/dev-teardown` | Removes skill symlinks, the `.claude/gstack-rendered/` render, and cleans up `.claude/` directory | When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It detects the main worktree (via `git worktree list`), copies your `.env` so API keys carry over, and sets up dev mode — no manual steps needed. diff --git a/VERSION b/VERSION index caf2638d9..10521b5e0 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.57.8.0 +1.57.9.0 diff --git a/bin/dev-setup b/bin/dev-setup index 0d8460f91..00a286706 100755 --- a/bin/dev-setup +++ b/bin/dev-setup @@ -72,7 +72,48 @@ fi # no-op skip (no install, no decline marker). A dev workspace must never mutate # global settings.json. To install the hooks, run `./setup --plan-tune-hooks` # directly (outside dev-setup). Saved prefix/other config preferences still apply. -"$GSTACK_LINK/setup" --plan-tune-hooks=prompt /dev/null; then + echo "" + echo "gbrain detected — rendering brain-aware skills into .claude/gstack-rendered (workspace-only, untracked)..." + rm -rf "$RENDER_DIR" + if ( cd "$REPO_ROOT" && bun run gen:skill-docs:user --host claude --out-dir "$RENDER_DIR" >/dev/null 2>&1 ); then + # Repoint each project-local SKILL.md symlink whose worktree target has a + # rendered counterpart. The skill DIRECTORY name (basename of the symlink + # target's dir) maps to RENDER_DIR//SKILL.md, which is robust to + # frontmatter renames and the gstack- prefix on the link name. + repointed=0 + for skill_link in "$REPO_ROOT"/.claude/skills/*/SKILL.md; do + [ -L "$skill_link" ] || continue + target="$(readlink "$skill_link")" + skilldir="$(basename "$(dirname "$target")")" + rendered="$RENDER_DIR/$skilldir/SKILL.md" + if [ -f "$rendered" ]; then ln -snf "$rendered" "$skill_link"; repointed=$((repointed + 1)); fi + done + echo " $repointed workspace skills now serve brain-aware blocks (worktree stays canonical)." + else + echo " warning: brain-aware render failed — workspace uses canonical skills." + fi +fi echo "" echo "Dev mode active. Skills resolve from this working tree." @@ -80,4 +121,7 @@ echo " .claude/skills/gstack → $REPO_ROOT" echo " .agents/skills/gstack → $REPO_ROOT" echo "Edit any SKILL.md and test immediately — no copy/deploy needed." echo "" +echo "To make brain-aware blocks live across your OTHER projects too, run:" +echo " gstack-config gbrain-refresh" +echo "" echo "To tear down: bin/dev-teardown" diff --git a/bin/dev-teardown b/bin/dev-teardown index dc8f74260..06189e189 100755 --- a/bin/dev-teardown +++ b/bin/dev-teardown @@ -24,9 +24,16 @@ if [ -d "$CLAUDE_SKILLS" ]; then fi rmdir "$CLAUDE_SKILLS" 2>/dev/null || true - rmdir "$REPO_ROOT/.claude" 2>/dev/null || true fi +# ─── Clean up the untracked brain-aware render (bin/dev-setup step 7) ── +RENDER_DIR="$REPO_ROOT/.claude/gstack-rendered" +if [ -d "$RENDER_DIR" ]; then + rm -rf "$RENDER_DIR" + removed+=("claude/gstack-rendered") +fi +rmdir "$REPO_ROOT/.claude" 2>/dev/null || true + # ─── Clean up .agents/skills/ ──────────────────────────────── AGENTS_SKILLS="$REPO_ROOT/.agents/skills" if [ -d "$AGENTS_SKILLS" ]; then diff --git a/bin/gstack-config b/bin/gstack-config index 01defcef8..ec465b281 100755 --- a/bin/gstack-config +++ b/bin/gstack-config @@ -396,8 +396,29 @@ case "${1:-}" in case "$STATUS" in ok) - echo "Detected gbrain v$VERSION → brain-aware blocks will render in planning-skill SKILL.md files." - echo "Run 'bun run gen:skill-docs' in the gstack repo (or re-run ./setup) to regenerate now." + echo "Detected gbrain v$VERSION." + # Render brain-aware blocks INTO the global install so EVERY project's + # Claude sessions get them (other projects read SKILL.md + sections from + # ~/.claude/skills/gstack via absolute paths baked at gen time). Guards + # (never mutate an arbitrary directory): the target must exist, not be a + # symlink (a symlinked install points at a dev worktree — rendering there + # would dirty tracked source), and look like a real gstack clone. + INSTALL_DIR="$HOME/.claude/skills/gstack" + if [ ! -d "$INSTALL_DIR" ]; then + echo "No global install at $INSTALL_DIR — nothing to render. (Dev workspaces get blocks via bin/dev-setup.)" + elif [ -L "$INSTALL_DIR" ]; then + echo "Skip: $INSTALL_DIR is a symlink (likely a dev worktree). Rendering there would dirty tracked source — run bin/dev-setup in that worktree instead." + elif [ ! -f "$INSTALL_DIR/VERSION" ] || [ ! -f "$INSTALL_DIR/package.json" ]; then + echo "Skip: $INSTALL_DIR doesn't look like a gstack clone (missing VERSION/package.json) — refusing to modify it." + elif ! command -v bun >/dev/null 2>&1; then + echo "Skip: bun not on PATH — can't render. Install bun, then re-run 'gstack-config gbrain-refresh'." + elif ( cd "$INSTALL_DIR" && bun run gen:skill-docs:user --host claude >/dev/null 2>&1 ); then + echo "Rendered brain-aware blocks into $INSTALL_DIR — now live across all your projects' Claude sessions." + echo "Note: this dirties the install's git tree (generated blocks differ from main, by design)." + echo " A 'git reset --hard origin/main' there reverts them; re-run 'gstack-config gbrain-refresh' to restore." + else + echo "Warning: render failed. Run 'cd $INSTALL_DIR && bun run gen:skill-docs:user --host claude' manually to see the error." + fi ;; *) echo "gbrain not detected (local-status: $STATUS) → brain-aware blocks will be suppressed in planning-skill SKILL.md files." diff --git a/bin/gstack-gbrain-detect b/bin/gstack-gbrain-detect index 897bec243..4eee753da 100755 --- a/bin/gstack-gbrain-detect +++ b/bin/gstack-gbrain-detect @@ -234,4 +234,14 @@ function main(): void { process.stdout.write(JSON.stringify(out, null, 2) + "\n"); } +// --is-ok: live engine-status gate. Exits 0 iff gbrain is usable ("ok"), 1 +// otherwise. Runs detection live (never reads the possibly-stale +// gbrain-detection.json), so callers — setup, bin/dev-setup, and +// `gstack-config gbrain-refresh` — can decide whether to render the gbrain +// :user variant without duplicating the JSON grep. Prints nothing on stdout. +if (process.argv.includes("--is-ok")) { + const noCache = process.env.GSTACK_DETECT_NO_CACHE === "1"; + process.exit(localEngineStatus({ noCache }) === "ok" ? 0 : 1); +} + main(); diff --git a/package.json b/package.json index 789aa8db8..53da1d736 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.57.8.0", + "version": "1.57.9.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", diff --git a/scripts/dev-skill.ts b/scripts/dev-skill.ts index ae6ba30ad..f585f57b5 100644 --- a/scripts/dev-skill.ts +++ b/scripts/dev-skill.ts @@ -50,6 +50,24 @@ function regenerateAndValidate() { console.log(` [check] \u2705 ${output} — ${totalValid} commands, all valid`); } } + + // Dev workspace render isolation: the default in-place regen above keeps the + // worktree canonical. If bin/dev-setup set up an untracked brain-aware render + // (.claude/gstack-rendered), refresh it too so live template edits reflect at + // this workspace's runtime. Only runs when the render dir already exists — we + // never create it during plain template dev. + const RENDER_DIR = path.join(ROOT, '.claude', 'gstack-rendered'); + if (fs.existsSync(RENDER_DIR)) { + try { + execSync( + `bun run scripts/gen-skill-docs.ts --respect-detection --host claude --out-dir ${JSON.stringify(RENDER_DIR)}`, + { cwd: ROOT, stdio: 'pipe' }, + ); + console.log(' [render] refreshed .claude/gstack-rendered (brain-aware workspace copy)'); + } catch (err: any) { + console.log(` [render] ERROR: ${err.stderr?.toString().trim() || err.message}`); + } + } } // Initial run diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 5fea07713..45f617a1b 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -137,6 +137,39 @@ const EXPLAIN_LEVEL: 'default' | 'terse' = (() => { return val; })(); +// ─── Out-dir (dev workspace render isolation) ─────────────── +// --out-dir redirects Claude SKILL.md + section output to a separate +// (untracked) directory instead of writing in place, AND rewrites the literal +// section-base path (`~/.claude/skills/gstack//sections/`) inside the +// generated content to point at the out-dir, so section Reads resolve to the +// rendered copy rather than the global install. Used by bin/dev-setup to render +// the gbrain `:user` variant for a Conductor workspace without dirtying tracked +// source. Default (unset) = in-place, behavior unchanged. Claude host only. +const OUT_DIR_ARG = process.argv.find(a => a.startsWith('--out-dir')); +const OUT_DIR: string | null = (() => { + if (!OUT_DIR_ARG) return null; + const val = OUT_DIR_ARG.includes('=') + ? OUT_DIR_ARG.split('=')[1] + : process.argv[process.argv.indexOf(OUT_DIR_ARG) + 1]; + if (!val) throw new Error('--out-dir requires a directory path'); + return path.resolve(val); +})(); + +/** + * When rendering to an out-dir, repoint the literal section-base path at the + * out-dir so section Reads resolve to the rendered copy, not the global install. + * Surgical: ONLY paths containing `/sections/` are rewritten — bin/, browse/, + * docs/ references keep pointing at `~/.claude/skills/gstack` (the global + * install, which still works). No-op when --out-dir is unset. + */ +function rewriteSectionBase(content: string): string { + if (!OUT_DIR) return content; + return content.replace( + /~\/\.claude\/skills\/gstack\/([^\s)`"'*]+\/sections\/)/g, + `${OUT_DIR}/$1`, + ); +} + // HostPaths, HOST_PATHS, and TemplateContext imported from ./resolvers/types (line 7-8) // Design constants (AI_SLOP_BLACKLIST, OPENAI_HARD_REJECTIONS, OPENAI_LITMUS_CHECKS) // live in ./resolvers/constants and are consumed by resolvers directly. @@ -768,6 +801,12 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: // Determine skill directory relative to ROOT const skillDir = path.relative(ROOT, path.dirname(tmplPath)); + // --out-dir (Claude only): mirror the skill tree into the out-dir instead of + // writing in place. External hosts compute their own paths below. + if (OUT_DIR && host === 'claude') { + outputPath = path.join(OUT_DIR, skillDir, path.basename(tmplPath).replace(/\.tmpl$/, '')); + } + // Extract name/description: name drives external skill naming + setup symlinks // (and TemplateContext.skillName via buildContext); description feeds external // host metadata. When frontmatter name: differs from directory name (e.g. @@ -822,6 +861,9 @@ function processTemplate(tmplPath: string, host: Host = 'claude'): { outputPath: } } + // --out-dir: repoint section-base paths to the out-dir (no-op otherwise). + if (host === 'claude') content = rewriteSectionBase(content); + return { outputPath, content, symlinkLoop, catalogParts }; } @@ -860,6 +902,10 @@ function processSectionTemplate( // External hosts: rewrite cross-reference paths/tools (no frontmatter to transform). if (host !== 'claude') { content = applyHostRewrites(content, hostConfig); + } else { + // --out-dir: a section may cross-reference another section by absolute path; + // repoint those to the out-dir too (no-op when --out-dir is unset). + content = rewriteSectionBase(content); } // Plain generated header (no frontmatter to insert after). @@ -868,7 +914,7 @@ function processSectionTemplate( const fileName = path.basename(sectionTmplPath).replace(/\.tmpl$/, ''); let outputPath: string; if (host === 'claude') { - outputPath = path.join(ROOT, skillDir, 'sections', fileName); + outputPath = path.join(OUT_DIR || ROOT, skillDir, 'sections', fileName); } else { const externalName = externalSkillName(skillDir, parentName); outputPath = path.join(ROOT, hostConfig.hostSubdir, 'skills', externalName, 'sections', fileName); @@ -933,7 +979,7 @@ for (const currentHost of hostsToRun) { voice_line: catalogParts.voiceLine, }; } - const relOutput = path.relative(ROOT, outputPath); + const relOutput = path.relative(OUT_DIR || ROOT, outputPath); if (symlinkLoop) { console.log(`SKIPPED (symlink loop): ${relOutput}`); @@ -946,6 +992,9 @@ for (const currentHost of hostsToRun) { console.log(`FRESH: ${relOutput}`); } } else { + // In-place writes land in existing dirs; --out-dir needs the mirrored + // skill dir created first. + if (OUT_DIR) fs.mkdirSync(path.dirname(outputPath), { recursive: true }); fs.writeFileSync(outputPath, content); console.log(`GENERATED: ${relOutput}`); } @@ -982,7 +1031,7 @@ for (const currentHost of hostsToRun) { currentHostConfig.generation.skipSkills.includes(sec.skillDir)) continue; const { outputPath, content } = processSectionTemplate(path.join(ROOT, sec.tmpl), sec.skillDir, currentHost); - const relOutput = path.relative(ROOT, outputPath); + const relOutput = path.relative(OUT_DIR || ROOT, outputPath); if (DRY_RUN) { const existing = fs.existsSync(outputPath) ? fs.readFileSync(outputPath, 'utf-8') : ''; @@ -1079,7 +1128,9 @@ The orchestrator will persist the plan link to its own memory/knowledge store. // No timestamp field — keeps the file content-deterministic across runs so // CI dry-run freshness checks don't flap on regen. If a per-run timestamp // is ever needed for debugging, write it to a separate `.gen-stamp` file. - if (currentHost === 'claude' && CATALOG_MODE === 'trim' && Object.keys(proactiveAggregate).length > 0 && !DRY_RUN) { + // Skip the global proactive-suggestions.json in --out-dir mode: it lives at + // a repo path (scripts/) and the dev workspace render doesn't need it. + if (currentHost === 'claude' && CATALOG_MODE === 'trim' && Object.keys(proactiveAggregate).length > 0 && !DRY_RUN && !OUT_DIR) { const proactivePath = path.join(ROOT, 'scripts', 'proactive-suggestions.json'); // Sort keys alphabetically so the serialized JSON is identical across // machines regardless of filesystem-iteration order. Without this, CI diff --git a/setup b/setup index 0c180f7bf..ec1db22b7 100755 --- a/setup +++ b/setup @@ -1286,22 +1286,37 @@ fi DETECT_BIN="$SOURCE_GSTACK_DIR/bin/gstack-gbrain-detect" GBRAIN_STATE_DIR="${GSTACK_HOME:-$HOME/.gstack}" DETECTION_FILE="$GBRAIN_STATE_DIR/gbrain-detection.json" +# PID-unique tmp so concurrent setups (parallel Conductor workspaces) can't +# clobber each other's in-flight detection write. +DETECTION_TMP="$DETECTION_FILE.$$.tmp" mkdir -p "$GBRAIN_STATE_DIR" if [ -x "$DETECT_BIN" ]; then - if "$DETECT_BIN" > "$DETECTION_FILE.tmp" 2>/dev/null; then - mv "$DETECTION_FILE.tmp" "$DETECTION_FILE" - if grep -q '"gbrain_local_status": "ok"' "$DETECTION_FILE" 2>/dev/null; then - log "gbrain detected — regenerating Claude SKILL.md with brain-aware blocks (~250 token overhead per planning skill)..." - ( - cd "$SOURCE_GSTACK_DIR" - bun_cmd run gen:skill-docs:user --host claude 2>&1 | tail -3 - ) || log " warning: gen:skill-docs:user failed — run 'bun run gen:skill-docs:user' manually if you want brain-aware blocks" + if "$DETECT_BIN" > "$DETECTION_TMP" 2>/dev/null; then + mv "$DETECTION_TMP" "$DETECTION_FILE" + # Single source of truth for "is gbrain usable" — `--is-ok` runs live + # detection (exit 0 iff ok), so setup, bin/dev-setup, and gstack-config + # all gate on the same check instead of re-grepping the JSON. + if "$DETECT_BIN" --is-ok 2>/dev/null; then + if [ -n "${GSTACK_SKIP_GBRAIN_REGEN:-}" ]; then + # Dev/source tree (set by bin/dev-setup): never regenerate tracked + # SKILL.md in place — that dirties checked-in source. Detection is + # still persisted above; the dev workspace renders the :user variant + # into an untracked dir, and other projects get blocks via + # `gstack-config gbrain-refresh`. + log "gbrain detected — GSTACK_SKIP_GBRAIN_REGEN set: leaving tracked SKILL.md canonical (dev/source tree)." + else + log "gbrain detected — regenerating Claude SKILL.md with brain-aware blocks (~250 token overhead per planning skill)..." + ( + cd "$SOURCE_GSTACK_DIR" + bun_cmd run gen:skill-docs:user --host claude 2>&1 | tail -3 + ) || log " warning: gen:skill-docs:user failed — run 'bun run gen:skill-docs:user' manually if you want brain-aware blocks" + fi else log "gbrain not detected — brain-aware blocks suppressed in planning-skill SKILL.md files (zero token overhead)." log " To enable: install gbrain via /setup-gbrain, then re-run ./setup or 'gstack-config gbrain-refresh'." fi else - rm -f "$DETECTION_FILE.tmp" + rm -f "$DETECTION_TMP" log " warning: gstack-gbrain-detect failed — brain-aware blocks will stay suppressed" fi fi diff --git a/test/dev-setup-render-isolation.test.ts b/test/dev-setup-render-isolation.test.ts new file mode 100644 index 000000000..fbfeb790c --- /dev/null +++ b/test/dev-setup-render-isolation.test.ts @@ -0,0 +1,91 @@ +import { describe, test, expect } from 'bun:test'; +import * as path from 'path'; +import * as fs from 'fs'; + +// Static tripwires for the B2 render-isolation wiring. These fail CI if a +// refactor drops a load-bearing line, re-introducing the "dev-setup dirties +// tracked SKILL.md" drift (or worse, leaks the skip-guard into real installs). +const ROOT = path.resolve(import.meta.dir, '..'); +const read = (rel: string) => fs.readFileSync(path.join(ROOT, rel), 'utf-8'); + +describe('dev-setup: worktree stays canonical', () => { + const devSetup = read('bin/dev-setup'); + + test('passes GSTACK_SKIP_GBRAIN_REGEN inline on the nested setup call', () => { + expect(devSetup).toContain('GSTACK_SKIP_GBRAIN_REGEN=1 "$GSTACK_LINK/setup"'); + }); + + test('never exports GSTACK_SKIP_GBRAIN_REGEN (would leak into other setup paths)', () => { + expect(devSetup).not.toMatch(/export\s+GSTACK_SKIP_GBRAIN_REGEN/); + }); + + test('renders the :user variant into an out-dir, not in place', () => { + expect(devSetup).toContain('--out-dir'); + expect(devSetup).toContain('.claude/gstack-rendered'); + }); + + test('gates the render on gstack-gbrain-detect --is-ok', () => { + expect(devSetup).toContain('--is-ok'); + }); +}); + +describe('setup: honors GSTACK_SKIP_GBRAIN_REGEN', () => { + const setup = read('setup'); + + test('skips the in-place :user regen when the guard is set', () => { + expect(setup).toContain('${GSTACK_SKIP_GBRAIN_REGEN:-}'); + // The guard must wrap the in-place render, not the detection persist. + const idx = setup.indexOf('GSTACK_SKIP_GBRAIN_REGEN'); + const after = setup.slice(idx, idx + 600); + expect(after).toContain('leaving tracked SKILL.md canonical'); + }); + + test('uses a PID-unique detection tmp (no concurrent clobber)', () => { + expect(setup).toContain('$DETECTION_FILE.$$.tmp'); + }); + + test('gates detection on the shared --is-ok check', () => { + expect(setup).toContain('"$DETECT_BIN" --is-ok'); + }); +}); + +describe('gen-skill-docs: section rewrite is gated on --out-dir', () => { + const gen = read('scripts/gen-skill-docs.ts'); + + test('rewriteSectionBase is a no-op without --out-dir', () => { + expect(gen).toContain('function rewriteSectionBase'); + const idx = gen.indexOf('function rewriteSectionBase'); + const body = gen.slice(idx, idx + 400); + expect(body).toContain('if (!OUT_DIR) return content'); + expect(body).toContain('sections'); // surgical: regex targets only /sections/ paths + }); +}); + +describe('dev-teardown: removes the untracked render', () => { + const teardown = read('bin/dev-teardown'); + + test('rm -rf the gstack-rendered dir', () => { + expect(teardown).toContain('gstack-rendered'); + expect(teardown).toMatch(/rm -rf .*RENDER_DIR/); + }); +}); + +describe('.gitignore: render dir is declared untracked', () => { + test('.claude/gstack-rendered/ is ignored', () => { + expect(read('.gitignore')).toContain('.claude/gstack-rendered/'); + }); +}); + +describe('dev-skill: refreshes the render on template change', () => { + const devSkill = read('scripts/dev-skill.ts'); + + test('re-renders the :user variant into the workspace render dir', () => { + expect(devSkill).toContain('gstack-rendered'); + expect(devSkill).toContain('--out-dir'); + expect(devSkill).toContain('--respect-detection'); + }); + + test('only refreshes when the render dir already exists (never creates it during plain dev)', () => { + expect(devSkill).toContain('fs.existsSync(RENDER_DIR)'); + }); +}); diff --git a/test/gbrain-detect-shape.test.ts b/test/gbrain-detect-shape.test.ts index 465e55623..e2f67ee07 100644 --- a/test/gbrain-detect-shape.test.ts +++ b/test/gbrain-detect-shape.test.ts @@ -16,7 +16,7 @@ */ import { describe, it, expect } from "bun:test"; -import { execFileSync } from "child_process"; +import { execFileSync, spawnSync } from "child_process"; import { mkdtempSync, mkdirSync, @@ -47,6 +47,16 @@ function runDetect(env: Partial): string { }); } +/** Run detect with --is-ok and return its exit code (never throws). */ +function runIsOk(env: Partial): number { + const r = spawnSync(BUN_BIN, ["run", DETECT_BIN, "--is-ok"], { + timeout: 15_000, + stdio: ["ignore", "pipe", "pipe"], + env: { ...process.env, ...env }, + }); + return r.status ?? 1; +} + interface DetectShape { gbrain_on_path: boolean; gbrain_version: string | null; @@ -244,3 +254,66 @@ exit 0 } }); }); + +describe("bin/gstack-gbrain-detect --is-ok — live gate", () => { + it("exits non-zero when gbrain is not on PATH (no-cli)", () => { + const tmp = mkdtempSync(join(tmpdir(), "detect-isok-")); + try { + const code = runIsOk({ + HOME: tmp, + PATH: "/usr/bin:/bin", // no gbrain + GSTACK_HOME: tmp, + GSTACK_DETECT_NO_CACHE: "1", + }); + expect(code).not.toBe(0); + } finally { + rmSync(tmp, { recursive: true, force: true }); + } + }); + + it("exits 0 when a fake gbrain reports a healthy engine (ok)", () => { + const tmp = mkdtempSync(join(tmpdir(), "detect-isok-")); + const bindir = join(tmp, "bin"); + const home = join(tmp, "home"); + const configDir = join(home, ".gbrain"); + try { + mkdirSync(bindir, { recursive: true }); + mkdirSync(configDir, { recursive: true }); + writeFileSync(join(configDir, "config.json"), JSON.stringify({ engine: "pglite" })); + const fake = `#!/bin/sh +case "$1 $2" in + "--version ") echo "gbrain 0.33.1.0"; exit 0 ;; + "sources list") echo '{"sources":[]}'; exit 0 ;; + "doctor "*) echo '{"status":"ok","checks":[]}'; exit 0 ;; +esac +exit 0 +`; + const gbrainPath = join(bindir, "gbrain"); + writeFileSync(gbrainPath, fake); + chmodSync(gbrainPath, 0o755); + + const code = runIsOk({ + HOME: home, + PATH: `${bindir}:/usr/bin:/bin`, + GSTACK_HOME: tmp, + GSTACK_DETECT_NO_CACHE: "1", + }); + expect(code).toBe(0); + } finally { + rmSync(tmp, { recursive: true, force: true }); + } + }); + + it("exit code agrees with the JSON gbrain_local_status (no skew)", () => { + // Run both surfaces against the same env and assert they never disagree. + const tmp = mkdtempSync(join(tmpdir(), "detect-isok-")); + try { + const env = { HOME: tmp, PATH: "/usr/bin:/bin", GSTACK_HOME: tmp, GSTACK_DETECT_NO_CACHE: "1" }; + const status = (JSON.parse(runDetect(env)) as DetectShape).gbrain_local_status; + const code = runIsOk(env); + expect(code === 0).toBe(status === "ok"); + } finally { + rmSync(tmp, { recursive: true, force: true }); + } + }); +}); diff --git a/test/gbrain-refresh-install-render.test.ts b/test/gbrain-refresh-install-render.test.ts new file mode 100644 index 000000000..f1494d47d --- /dev/null +++ b/test/gbrain-refresh-install-render.test.ts @@ -0,0 +1,60 @@ +import { describe, test, expect } from 'bun:test'; +import * as path from 'path'; +import * as fs from 'fs'; + +// Static tripwires for the C (machine-wide) render in `gstack-config +// gbrain-refresh`. The render mutates the shared global install, so the guards +// that stop it from touching the wrong directory are load-bearing — these fail +// CI if any guard is dropped. +const ROOT = path.resolve(import.meta.dir, '..'); +const SRC = fs.readFileSync(path.join(ROOT, 'bin', 'gstack-config'), 'utf-8'); + +// Pull out just the gbrain-refresh `ok)` branch so assertions can't be +// satisfied by unrelated text elsewhere in the file. +function okBranch(): string { + const start = SRC.indexOf('gbrain-refresh)'); + const ok = SRC.indexOf('ok)', start); + const end = SRC.indexOf(';;', ok); + if (start < 0 || ok < 0 || end < 0) throw new Error('Could not locate gbrain-refresh ok) branch'); + return SRC.slice(ok, end); +} + +describe('gstack-config gbrain-refresh: machine-wide render guards', () => { + const branch = okBranch(); + + test('targets the global install', () => { + expect(branch).toContain('$HOME/.claude/skills/gstack'); + }); + + test('refuses a symlinked install (would dirty a dev worktree)', () => { + expect(branch).toMatch(/\[ -L "\$INSTALL_DIR" \]/); + }); + + test('verifies it is a real gstack clone before mutating it', () => { + expect(branch).toContain('$INSTALL_DIR/VERSION'); + expect(branch).toContain('$INSTALL_DIR/package.json'); + }); + + test('requires bun on PATH', () => { + expect(branch).toContain('command -v bun'); + }); + + test('renders the :user variant in place into the install', () => { + expect(branch).toContain('gen:skill-docs:user --host claude'); + }); + + test('is self-documenting about the reset --hard / re-run cycle', () => { + expect(branch).toContain('reset --hard'); + expect(branch).toContain('gbrain-refresh'); + }); +}); + +describe('CLAUDE.md: deploy section documents the re-run', () => { + test('notes re-running gbrain-refresh after reset --hard', () => { + const claudeMd = fs.readFileSync(path.join(ROOT, 'CLAUDE.md'), 'utf-8'); + const idx = claudeMd.indexOf('## Deploying to the active skill'); + expect(idx).toBeGreaterThan(-1); + const section = claudeMd.slice(idx, idx + 1200); + expect(section).toContain('gbrain-refresh'); + }); +}); diff --git a/test/gen-skill-docs-out-dir.test.ts b/test/gen-skill-docs-out-dir.test.ts new file mode 100644 index 000000000..fbc1345e4 --- /dev/null +++ b/test/gen-skill-docs-out-dir.test.ts @@ -0,0 +1,84 @@ +import { describe, test, expect } from 'bun:test'; +import { spawnSync } from 'child_process'; +import { createHash } from 'crypto'; +import * as path from 'path'; +import * as fs from 'fs'; +import * as os from 'os'; + +const ROOT = path.resolve(import.meta.dir, '..'); + +// Render the gbrain `:user` variant into a temp out-dir, forcing detection ON +// via a crafted GSTACK_HOME so the test is deterministic regardless of whether +// the dev machine actually has gbrain installed. Asserts the B2 contract: +// (a) the worktree SKILL.md is byte-unchanged (source stays canonical), +// (b) the out-dir SKILL.md gained the inline Brain Context Load block, +// (c) its section refs point at the out-dir, not ~/.claude/skills/gstack, +// (d) bin/ refs are left pointing at the global install, +// (e) the out-dir section file gained the Save Results to Brain block. +describe('gen-skill-docs --out-dir (B2 render isolation)', () => { + function hashFile(p: string): string { + return createHash('sha256').update(fs.readFileSync(p)).digest('hex'); + } + + test('renders :user to out-dir, rewrites section paths, leaves worktree canonical', () => { + const tmpHome = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-home-')); + const outDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-out-')); + const worktreeSkill = path.join(ROOT, 'ship', 'SKILL.md'); + const beforeHash = hashFile(worktreeSkill); + try { + // Force gbrain detection ON for --respect-detection. + fs.writeFileSync( + path.join(tmpHome, 'gbrain-detection.json'), + JSON.stringify({ gbrain_local_status: 'ok', gbrain_version: '9.9.9' }), + ); + + const res = spawnSync( + 'bun', + ['run', 'scripts/gen-skill-docs.ts', '--respect-detection', '--host', 'claude', '--out-dir', outDir], + { cwd: ROOT, encoding: 'utf-8', timeout: 120_000, env: { ...process.env, GSTACK_HOME: tmpHome } }, + ); + expect(res.status).toBe(0); + + const outSkill = path.join(outDir, 'ship', 'SKILL.md'); + const outSection = path.join(outDir, 'ship', 'sections', 'adversarial.md'); + expect(fs.existsSync(outSkill)).toBe(true); + const skillContent = fs.readFileSync(outSkill, 'utf-8'); + + // (a) worktree byte-unchanged + expect(hashFile(worktreeSkill)).toBe(beforeHash); + + // (b) inline block present in the rendered SKILL.md + expect(skillContent).toContain('Brain Context Load'); + + // (c) section refs repointed to the out-dir; none left pointing at the install + expect(skillContent).toContain(`${outDir}/ship/sections/`); + expect(skillContent).not.toContain('~/.claude/skills/gstack/ship/sections/'); + + // (d) bin refs are NOT rewritten — they still resolve to the global install + expect(skillContent).toContain('~/.claude/skills/gstack/bin/'); + + // (e) the SAVE block landed in the rendered section file + expect(fs.existsSync(outSection)).toBe(true); + expect(fs.readFileSync(outSection, 'utf-8')).toContain('Save Results to Brain'); + } finally { + fs.rmSync(tmpHome, { recursive: true, force: true }); + fs.rmSync(outDir, { recursive: true, force: true }); + } + }); + + test('global extras (proactive-suggestions.json) are NOT written in out-dir mode', () => { + const outDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-out-')); + try { + const res = spawnSync( + 'bun', + ['run', 'scripts/gen-skill-docs.ts', '--host', 'claude', '--out-dir', outDir], + { cwd: ROOT, encoding: 'utf-8', timeout: 120_000 }, + ); + expect(res.status).toBe(0); + // proactive-suggestions.json lives at a repo path; out-dir mode must skip it. + expect(fs.existsSync(path.join(outDir, 'scripts', 'proactive-suggestions.json'))).toBe(false); + } finally { + fs.rmSync(outDir, { recursive: true, force: true }); + } + }); +});