From 4a404c003639c868866de5f2a94d09d5034082a7 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 18 Apr 2026 18:17:54 +0800 Subject: [PATCH] chore: bump version and changelog (v1.1.0.0) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands) - package.json: matching version bump - CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector, viewport --scale, file:// support, setContent replay, and DX polish in user voice with a dedicated Security section for file:// safe-dirs policy - browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13 "Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by- side API mapping and a worked tweet-renderer migration example - browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs` to reflect the new command descriptions Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 24 ++++++++++++++++++ SKILL.md | 7 +++--- VERSION | 2 +- browse/SKILL.md | 58 +++++++++++++++++++++++++++++++++++++++++--- browse/SKILL.md.tmpl | 51 ++++++++++++++++++++++++++++++++++++++ package.json | 2 +- 6 files changed, 136 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ac13e0db..98a1a236 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,29 @@ # Changelog +## [1.1.0.0] - 2026-04-18 + +### Added +- **Browse can now render local HTML without an HTTP server.** Two ways: `$B goto file:///tmp/report.html` navigates to a local file (including cwd-relative `file://./x` and home-relative `file://~/x` forms, smart-parsed so you don't have to think about URL grammar), or `$B load-html /tmp/tweet.html` reads the file and loads it via `page.setContent()`. Both are scoped to cwd + temp dir for safety. If you're migrating a Puppeteer script that generates HTML in memory, this kills your Python-HTTP-server workaround. +- **Element screenshots with an explicit flag.** `$B screenshot out.png --selector .card` is now the unambiguous way to screenshot a single element. Positional selectors still work, but tag selectors like `button` weren't recognized positionally, so the flag form fixes that. `--selector` composes with `--base64` and rejects alongside `--clip` (choose one). +- **Retina screenshots via `--scale`.** `$B viewport 480x2000 --scale 2` sets `deviceScaleFactor: 2` and produces pixel-doubled screenshots. `$B viewport --scale 2` alone changes just the scale factor and keeps the current size. Scale is capped at 1-3 (gstack policy). Headed mode rejects the flag since scale is controlled by the real browser window. +- **Load-HTML content survives scale changes.** Changing `--scale` rebuilds the browser context (that's how Playwright works), which previously would have wiped pages loaded via `load-html`. Now the HTML is cached in tab state and replayed into the new context automatically. In-memory only; never persisted to disk. +- **Puppeteer → browse cheatsheet in SKILL.md.** Side-by-side table of Puppeteer APIs mapped to browse commands, plus a full worked example (tweet-renderer flow: viewport + scale + load-html + element screenshot). +- **Guess-friendly aliases.** Type `setcontent` or `set-content` and it routes to `load-html`. Canonicalization happens before scope checks, so read-scoped tokens can't use the alias to bypass write-scope enforcement. +- **`Did you mean ...?` on unknown commands.** `$B load-htm` returns `Unknown command: 'load-htm'. Did you mean 'load-html'?`. Levenshtein match within distance 2, gated on input length ≥ 4 so 2-letter typos don't produce noise. +- **Rich, actionable errors on `load-html`.** Every rejection path (file not found, directory, oversize, outside safe dirs, binary content, frame context) names the input, explains the cause, and says what to do next. Extension allowlist `.html/.htm/.xhtml/.svg` + magic-byte sniff (with UTF-8 BOM strip) catches mis-renamed binaries before they render as garbage. + +### Security +- `file://` navigation is now an accepted scheme in `goto`, scoped to cwd + temp dir via the existing `validateReadPath()` policy. UNC/network hosts (`file://host.example.com/...`), IP hosts, IPv6 hosts, and Windows drive-letter hosts are all rejected with explicit errors. + +### For contributors +- `validateNavigationUrl()` now returns the normalized URL (previously void). All four callers — goto, diff, newTab, restoreState — updated to consume the return value so smart-parsing takes effect at every navigation site. +- New `normalizeFileUrl()` helper uses `fileURLToPath()` + `pathToFileURL()` from `node:url` — never string-concat — so URL escapes like `%20` decode correctly and encoded-slash traversal (`%2F..%2F`) is rejected by Node outright. +- New `TabSession.loadedHtml` field + `setTabContent()` / `getLoadedHtml()` / `clearLoadedHtml()` methods. ASCII lifecycle diagram in the source. The `clear` call happens BEFORE navigation starts (not after) so a goto that times out post-commit doesn't leave stale metadata that could resurrect on a later context recreation. +- `BrowserManager.setDeviceScaleFactor(scale, w, h)` is atomic: validates input, stores new values, calls `recreateContext()`, rolls back the fields on failure. `currentViewport` tracking means recreateContext preserves your size instead of hardcoding 1280×720. +- `COMMAND_ALIASES` + `canonicalizeCommand()` + `buildUnknownCommandError()` + `NEW_IN_VERSION` are exported from `browse/src/commands.ts`. Single source of truth — both the server dispatcher and `chain` prevalidation import from the same place. Chain uses `{ rawName, name }` shape per step so audit logs preserve what the user typed while dispatch uses the canonical name. +- `load-html` is registered in `SCOPE_WRITE` in `browse/src/token-registry.ts`. +- Review history for the curious: 3 Codex consults (20 + 10 + 6 gaps), DX review (TTHW ~4min → <60s, Champion tier), 2 Eng review passes. Third Codex pass caught the 4-caller bug for `validateNavigationUrl` that the eng passes missed. All findings folded into the plan. + ## [1.0.0.0] - 2026-04-18 ### Added diff --git a/SKILL.md b/SKILL.md index 4d3b1d41..33f479d2 100644 --- a/SKILL.md +++ b/SKILL.md @@ -797,7 +797,8 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. |---------|-------------| | `back` | History back | | `forward` | History forward | -| `goto ` | Navigate to URL | +| `goto ` | Navigate to URL (http://, https://, or file:// scoped to cwd/TEMP_DIR) | +| `load-html [--wait-until load|domcontentloaded|networkidle]` | Load a local HTML file via setContent (no HTTP server needed). For self-contained HTML (inline CSS/JS, data URIs). For HTML on disk, goto file://... is often cleaner. | | `reload` | Reload page | | `url` | Print current URL | @@ -848,7 +849,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | `type ` | Type into focused element | | `upload [file2...]` | Upload file(s) | | `useragent ` | Set user agent | -| `viewport ` | Set viewport size | +| `viewport [] [--scale ]` | Set viewport size and optional deviceScaleFactor (1-3, for retina screenshots). --scale requires a context rebuild. | | `wait ` | Wait for element, network idle, or page load (timeout: 15s) | ### Inspection @@ -875,7 +876,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | `pdf [path]` | Save as PDF | | `prettyscreenshot [--scroll-to sel|text] [--cleanup] [--hide sel...] [--width px] [path]` | Clean screenshot with optional cleanup, scroll positioning, and element hiding | | `responsive [prefix]` | Screenshots at mobile (375x812), tablet (768x1024), desktop (1280x720). Saves as {prefix}-mobile.png etc. | -| `screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]` | Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport) | +| `screenshot [--selector ] [--viewport] [--clip x,y,w,h] [--base64] [selector|@ref] [path]` | Save screenshot. --selector targets a specific element (explicit flag form). Positional selectors starting with ./#/@/[ still work. | ### Snapshot | Command | Description | diff --git a/VERSION b/VERSION index 1921233b..a6bbdb5f 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.0.0.0 +1.1.0.0 diff --git a/browse/SKILL.md b/browse/SKILL.md index d112a9d4..23b32a85 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -584,6 +584,57 @@ $B diff https://staging.app.com https://prod.app.com ### 11. Show screenshots to the user After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. +### 12. Render local HTML (no HTTP server needed) +Two paths, pick the cleaner one: +```bash +# HTML file on disk → goto file:// (absolute, or cwd-relative) +$B goto file:///tmp/report.html +$B goto file://./docs/page.html # cwd-relative +$B goto file://~/Documents/page.html # home-relative + +# HTML generated in memory → load-html reads the file into setContent +echo '
hello
' > /tmp/tweet.html +$B load-html /tmp/tweet.html +``` + +`goto file://...` is usually cleaner (URL is saved in state, relative asset URLs resolve against the file's dir, scale changes replay naturally). `load-html` uses `page.setContent()` — URL stays `about:blank`, but the content survives `viewport --scale` via in-memory replay. Both are scoped to files under cwd or `$TMPDIR`. + +### 13. Retina screenshots (deviceScaleFactor) +```bash +$B viewport 480x600 --scale 2 # 2x deviceScaleFactor +$B load-html /tmp/tweet.html # or: $B goto file://./tweet.html +$B screenshot /tmp/out.png --selector .tweet-card +# → /tmp/out.png is 2x the pixel dimensions of the element +``` +Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. + +## Puppeteer → browse cheatsheet + +Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: + +| Puppeteer | browse | +|---|---| +| `await page.goto(url)` | `$B goto ` | +| `await page.setContent(html)` | `$B load-html ` (or `$B goto file://`) | +| `await page.setViewport({width, height})` | `$B viewport WxH` | +| `await page.setViewport({width, height, deviceScaleFactor: 2})` | `$B viewport WxH --scale 2` | +| `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | +| `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | +| `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | + +Worked example (the tweet-renderer flow — Puppeteer → browse): + +```bash +# Generate HTML in memory, render at 2x scale, screenshot the tweet card. +echo '
hello
' > /tmp/tweet.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/tweet.html +$B screenshot /tmp/out.png --selector .tweet-card +# /tmp/out.png is 800x400 px, crisp (2x deviceScaleFactor). +``` + +Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor @@ -688,7 +739,8 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero |---------|-------------| | `back` | History back | | `forward` | History forward | -| `goto ` | Navigate to URL | +| `goto ` | Navigate to URL (http://, https://, or file:// scoped to cwd/TEMP_DIR) | +| `load-html [--wait-until load|domcontentloaded|networkidle]` | Load a local HTML file via setContent (no HTTP server needed). For self-contained HTML (inline CSS/JS, data URIs). For HTML on disk, goto file://... is often cleaner. | | `reload` | Reload page | | `url` | Print current URL | @@ -739,7 +791,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | `type ` | Type into focused element | | `upload [file2...]` | Upload file(s) | | `useragent ` | Set user agent | -| `viewport ` | Set viewport size | +| `viewport [] [--scale ]` | Set viewport size and optional deviceScaleFactor (1-3, for retina screenshots). --scale requires a context rebuild. | | `wait ` | Wait for element, network idle, or page load (timeout: 15s) | ### Inspection @@ -766,7 +818,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | `pdf [path]` | Save as PDF | | `prettyscreenshot [--scroll-to sel|text] [--cleanup] [--hide sel...] [--width px] [path]` | Clean screenshot with optional cleanup, scroll positioning, and element hiding | | `responsive [prefix]` | Screenshots at mobile (375x812), tablet (768x1024), desktop (1280x720). Saves as {prefix}-mobile.png etc. | -| `screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]` | Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport) | +| `screenshot [--selector ] [--viewport] [--clip x,y,w,h] [--base64] [selector|@ref] [path]` | Save screenshot. --selector targets a specific element (explicit flag form). Positional selectors starting with ./#/@/[ still work. | ### Snapshot | Command | Description | diff --git a/browse/SKILL.md.tmpl b/browse/SKILL.md.tmpl index 5d4ba8fc..ec4fcad7 100644 --- a/browse/SKILL.md.tmpl +++ b/browse/SKILL.md.tmpl @@ -111,6 +111,57 @@ $B diff https://staging.app.com https://prod.app.com ### 11. Show screenshots to the user After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible. +### 12. Render local HTML (no HTTP server needed) +Two paths, pick the cleaner one: +```bash +# HTML file on disk → goto file:// (absolute, or cwd-relative) +$B goto file:///tmp/report.html +$B goto file://./docs/page.html # cwd-relative +$B goto file://~/Documents/page.html # home-relative + +# HTML generated in memory → load-html reads the file into setContent +echo '
hello
' > /tmp/tweet.html +$B load-html /tmp/tweet.html +``` + +`goto file://...` is usually cleaner (URL is saved in state, relative asset URLs resolve against the file's dir, scale changes replay naturally). `load-html` uses `page.setContent()` — URL stays `about:blank`, but the content survives `viewport --scale` via in-memory replay. Both are scoped to files under cwd or `$TMPDIR`. + +### 13. Retina screenshots (deviceScaleFactor) +```bash +$B viewport 480x600 --scale 2 # 2x deviceScaleFactor +$B load-html /tmp/tweet.html # or: $B goto file://./tweet.html +$B screenshot /tmp/out.png --selector .tweet-card +# → /tmp/out.png is 2x the pixel dimensions of the element +``` +Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode. + +## Puppeteer → browse cheatsheet + +Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow: + +| Puppeteer | browse | +|---|---| +| `await page.goto(url)` | `$B goto ` | +| `await page.setContent(html)` | `$B load-html ` (or `$B goto file://`) | +| `await page.setViewport({width, height})` | `$B viewport WxH` | +| `await page.setViewport({width, height, deviceScaleFactor: 2})` | `$B viewport WxH --scale 2` | +| `await (await page.$('.x')).screenshot({path})` | `$B screenshot --selector .x` | +| `await page.screenshot({fullPage: true, path})` | `$B screenshot ` (full page default) | +| `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot --clip x,y,w,h` | + +Worked example (the tweet-renderer flow — Puppeteer → browse): + +```bash +# Generate HTML in memory, render at 2x scale, screenshot the tweet card. +echo '
hello
' > /tmp/tweet.html +$B viewport 480x600 --scale 2 +$B load-html /tmp/tweet.html +$B screenshot /tmp/out.png --selector .tweet-card +# /tmp/out.png is 800x400 px, crisp (2x deviceScaleFactor). +``` + +Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`. + ## User Handoff When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor diff --git a/package.json b/package.json index cfc1703c..732fcde1 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.0.0.0", + "version": "1.1.0.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module",