diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e67a307d1..a4872fc47 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -232,6 +232,14 @@ For template authoring best practices (natural language over bash-isms, dynamic To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild. +**Don't bundle puppeteer/Chromium in a skill.** `browse` is the one shared +Chromium per box, including offline local-render workloads. A skill that needs to +rasterize its own HTML/JSON (diagrams, cards, og-images) should route through +`browse` — `screenshot --selector` for visual output, `load-html` + `js --out` for +bytes a render function returns — instead of `npm i puppeteer` and downloading a +second Chromium that drifts out of version sync. One install to pin, one daemon to +manage. + ## Jargon list (V1 writing style) gstack's Writing Style section (injected into every tier-≥2 skill's preamble) diff --git a/SKILL.md b/SKILL.md index 0b06b802b..8711ae7f3 100644 --- a/SKILL.md +++ b/SKILL.md @@ -917,10 +917,10 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | `cookies` | All cookies as JSON | | `css ` | Computed CSS value | | `dialog [--clear]` | Dialog messages | -| `eval ` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. | +| `eval [--out ] [--raw]` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `inspect [selector] [--all] [--history]` | Deep CSS inspection via CDP — full rule cascade, box model, computed styles | | `is ` | State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected. | -| `js ` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. | +| `js [--out ] [--raw]` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage | storage set ` | Read both localStorage and sessionStorage as JSON. With "set ", write to localStorage only (sessionStorage is read-only via this command — set it with `js sessionStorage.setItem(...)`). | diff --git a/browse/SKILL.md b/browse/SKILL.md index e36fc9c86..0f670d47a 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -644,6 +644,51 @@ $B screenshot /tmp/out.png --selector .tweet-card ``` Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. +### 14. Offline render mode (rasterize your own HTML/JSON, zero network) + +This is the blessed path for "I just want to turn my own local HTML or JSON into a +PNG/PDF/bytes on disk" — Excalidraw diagrams, tweet/quote cards, og-images, +report rasterization. It is **plain headless, shared Chromium, no proxy, no Xvfb, +no anti-bot stealth**. Default `$B` is already exactly this; you do not pass +`--headed` or `--proxy`. One Chromium per box, shared by every skill — **do not +`npm i puppeteer` and ship a second browser** (see the note under the cheatsheet). + +Two output shapes, pick by what you have: + +**A) Visual output → `screenshot --selector` (preferred).** If the thing you want +is a picture of something on the page, screenshot it. The PNG is written from the +browser process straight to disk — the image bytes never cross the CDP wire. + +```bash +echo '
hi
' > /tmp/card.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/card.html +$B screenshot /tmp/card.png --selector '#card' # disk path — no megabytes over CDP +``` +(Use the disk path, NOT `screenshot --base64` — base64 serializes the bytes back +through the command channel, which is the cost you're trying to avoid.) + +**B) Bytes a function returns → `js --out` / `eval --out`.** When a library hands +you the result as a return value (a base64 data URL, a blob, computed JSON) rather +than painting a stable element — e.g. Excalidraw's export function returns a PNG +data URL — write the evaluate result straight to disk. `--out` decodes a +`data:*;base64,...` result to raw bytes automatically (pass `--raw` to write the +literal string). The payload is written by the daemon and never serialized back +out to the CLI/stdout. + +```bash +# Load the render bundle, signal readiness, then render-to-file. +$B load-html /tmp/excalidraw-export.html # bundle sets window.__render + a #done flag +$B wait '#done' # deterministic ready handshake +$B js "window.__render(SCENE_JSON)" --out /tmp/diagram.png # data URL → decoded PNG on disk +``` + +`--out` is a WRITE: it needs the `write` scope and is never allowed over the +pair-agent tunnel (a remote agent can't write to your disk). Parent directories +are created; malformed base64 errors instead of writing corrupt bytes. Pick A when +you can (no CDP transfer at all); reach for B only when the bytes come back as a +return value. + ## Puppeteer → browse cheatsheet Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: @@ -657,6 +702,8 @@ Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: | `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | | `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | | `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | +| `const r = await page.evaluate(fn)` | `$B js ""` (result to stdout) | +| `fs.writeFileSync(out, Buffer.from(dataUrl.split(',')[1],'base64'))` | `$B js "" --out ` (data URL auto-decoded) | Worked example (the tweet-renderer flow — Puppeteer → browse): @@ -671,6 +718,13 @@ $B screenshot /tmp/out.png --selector .tweet-card Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. +**Don't bundle your own puppeteer/Chromium.** `browse` is the one shared Chromium +per box. Skills that need to rasterize local HTML/JSON (diagrams, cards, og-images) +should route through `browse` — `screenshot --selector` for visual output, +`load-html` + `js --out` for bytes a function returns — instead of +`npm i puppeteer` and downloading a second Chromium that drifts out of version sync. +One install to pin, one daemon's lifecycle to manage. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor @@ -875,10 +929,10 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | `cookies` | All cookies as JSON | | `css ` | Computed CSS value | | `dialog [--clear]` | Dialog messages | -| `eval ` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. | +| `eval [--out ] [--raw]` | Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `inspect [selector] [--all] [--history]` | Deep CSS inspection via CDP — full rule cascade, box model, computed styles | | `is ` | State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected. | -| `js ` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. | +| `js [--out ] [--raw]` | Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel). | | `network [--clear]` | Network requests | | `perf` | Page load timings | | `storage | storage set ` | Read both localStorage and sessionStorage as JSON. With "set ", write to localStorage only (sessionStorage is read-only via this command — set it with `js sessionStorage.setItem(...)`). | diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index a466fc446..9a159e4c9 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -135,6 +135,51 @@ $B screenshot /tmp/out.png --selector .tweet-card ``` Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. +### 14. Offline render mode (rasterize your own HTML/JSON, zero network) + +This is the blessed path for "I just want to turn my own local HTML or JSON into a +PNG/PDF/bytes on disk" — Excalidraw diagrams, tweet/quote cards, og-images, +report rasterization. It is **plain headless, shared Chromium, no proxy, no Xvfb, +no anti-bot stealth**. Default `$B` is already exactly this; you do not pass +`--headed` or `--proxy`. One Chromium per box, shared by every skill — **do not +`npm i puppeteer` and ship a second browser** (see the note under the cheatsheet). + +Two output shapes, pick by what you have: + +**A) Visual output → `screenshot --selector` (preferred).** If the thing you want +is a picture of something on the page, screenshot it. The PNG is written from the +browser process straight to disk — the image bytes never cross the CDP wire. + +```bash +echo '
hi
' > /tmp/card.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/card.html +$B screenshot /tmp/card.png --selector '#card' # disk path — no megabytes over CDP +``` +(Use the disk path, NOT `screenshot --base64` — base64 serializes the bytes back +through the command channel, which is the cost you're trying to avoid.) + +**B) Bytes a function returns → `js --out` / `eval --out`.** When a library hands +you the result as a return value (a base64 data URL, a blob, computed JSON) rather +than painting a stable element — e.g. Excalidraw's export function returns a PNG +data URL — write the evaluate result straight to disk. `--out` decodes a +`data:*;base64,...` result to raw bytes automatically (pass `--raw` to write the +literal string). The payload is written by the daemon and never serialized back +out to the CLI/stdout. + +```bash +# Load the render bundle, signal readiness, then render-to-file. +$B load-html /tmp/excalidraw-export.html # bundle sets window.__render + a #done flag +$B wait '#done' # deterministic ready handshake +$B js "window.__render(SCENE_JSON)" --out /tmp/diagram.png # data URL → decoded PNG on disk +``` + +`--out` is a WRITE: it needs the `write` scope and is never allowed over the +pair-agent tunnel (a remote agent can't write to your disk). Parent directories +are created; malformed base64 errors instead of writing corrupt bytes. Pick A when +you can (no CDP transfer at all); reach for B only when the bytes come back as a +return value. + ## Puppeteer → browse cheatsheet Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: @@ -148,6 +193,8 @@ Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: | `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | | `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | | `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | +| `const r = await page.evaluate(fn)` | `$B js ""` (result to stdout) | +| `fs.writeFileSync(out, Buffer.from(dataUrl.split(',')[1],'base64'))` | `$B js "" --out ` (data URL auto-decoded) | Worked example (the tweet-renderer flow — Puppeteer → browse): @@ -162,6 +209,13 @@ $B screenshot /tmp/out.png --selector .tweet-card Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. +**Don't bundle your own puppeteer/Chromium.** `browse` is the one shared Chromium +per box. Skills that need to rasterize local HTML/JSON (diagrams, cards, og-images) +should route through `browse` — `screenshot --selector` for visual output, +`load-html` + `js --out` for bytes a function returns — instead of +`npm i puppeteer` and downloading a second Chromium that drifts out of version sync. +One install to pin, one daemon's lifecycle to manage. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor diff --git a/browse/src/commands.ts b/browse/src/commands.ts index 7e647a002..73bc9ab1b 100644 --- a/browse/src/commands.ts +++ b/browse/src/commands.ts @@ -106,8 +106,8 @@ export const COMMAND_DESCRIPTIONS: Record' }, - 'eval': { category: 'Inspection', description: 'Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners.', usage: 'eval ' }, + 'js': { category: 'Inspection', description: 'Run inline JavaScript expression in the page context and return result as string. Same JS sandbox as eval; the only difference is js takes an inline expr while eval reads from a file. With --out , the result is written to disk instead of returned (a base64 data URL is decoded to raw bytes unless --raw is given) — ideal for rasterizing local renders to PNG without serializing megabytes back through the CLI. --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel).', usage: 'js [--out ] [--raw]' }, + 'eval': { category: 'Inspection', description: 'Run JavaScript from a file in the page context and return result as string. Path must resolve under /tmp or cwd (no traversal). Use eval for multi-line scripts; use js for one-liners. With --out , the result is written to disk (base64 data URL decoded to bytes unless --raw); --out makes the invocation a WRITE (needs write scope, never allowed over the tunnel).', usage: 'eval [--out ] [--raw]' }, 'css': { category: 'Inspection', description: 'Computed CSS value', usage: 'css ' }, 'attrs': { category: 'Inspection', description: 'Element attributes as JSON', usage: 'attrs ' }, 'is': { category: 'Inspection', description: 'State check on element. Valid values: visible, hidden, enabled, disabled, checked, editable, focused (case-sensitive). accepts a CSS selector OR an @ref token from a prior snapshot (e.g. @e3, @c1) — refs are interchangeable with selectors anywhere a selector is expected.', usage: 'is ' }, diff --git a/gstack/llms.txt b/gstack/llms.txt index a11b045d1..9f9f717ec 100644 --- a/gstack/llms.txt +++ b/gstack/llms.txt @@ -81,10 +81,10 @@ Run with `browse [args]`. Full reference: `browse/SKILL.md`. - `cookies`: All cookies as JSON - `css `: Computed CSS value - `dialog [--clear]`: Dialog messages -- `eval `: Run JavaScript from a file in the page context and return result as string. +- `eval [--out ] [--raw]`: Run JavaScript from a file in the page context and return result as string. - `inspect [selector] [--all] [--history]`: Deep CSS inspection via CDP — full rule cascade, box model, computed styles - `is `: State check on element. -- `js `: Run inline JavaScript expression in the page context and return result as string. +- `js [--out ] [--raw]`: Run inline JavaScript expression in the page context and return result as string. - `network [--clear]`: Network requests - `perf`: Page load timings - `storage | storage set `: Read both localStorage and sessionStorage as JSON.