mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 11:17:50 +02:00
c15b805cd8
* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives
Adds the foundation layer for Puppeteer-parity features:
- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
enables load-html content to survive context recreation (viewport --scale)
via in-memory replay. ASCII lifecycle diagram in the source explains the
clear-before-navigation contract.
- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
for name aliases (setcontent / set-content / setContent → load-html),
consumed by server dispatch and chain prevalidation.
- buildUnknownCommandError() pure function — rich error messages with
Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.
- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
tokens can use it.
- screenshot and viewport descriptions updated for upcoming flags.
- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
Levenshtein threshold + alphabetical tiebreak, short-input guard,
NEW_IN_VERSION upgrade hint, alias + scope integration invariants.
No consumers yet — pure additive foundation. Safe to bisect on its own.
* feat(browse): accept file:// in goto with smart cwd/home-relative parsing
Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:
file:///abs/path.html → unchanged
file://./docs/page.html → file://<cwd>/docs/page.html
file://~/Documents/page.html → file://<HOME>/Documents/page.html
file://docs/page.html → file://<cwd>/docs/page.html
file://localhost/abs/path → unchanged
file://host.example.com/... → rejected (UNC/network)
file:// and file:/// → rejected (would list a directory)
Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.
Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.
Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).
Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.
References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.
* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing
Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:
1. deviceScaleFactor + currentViewport tracking. New private fields (default
scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
deviceScaleFactor is a context-level Playwright option — changing it
requires recreateContext(). The method validates (finite number, 1-3 cap,
headed-mode rejected), stores new values, calls recreateContext(), and
rolls back the fields on failure so a bad call doesn't leave inconsistent
state. Context options at all three sites (launch, recreate happy path,
recreate fallback) now honor the stored values instead of hardcoding
1280x720.
2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
loadedHtml from the session; restoreState replays it via newSession.
setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
is rehydrated and survives *subsequent* scale changes. In-memory only,
never persisted to disk (HTML may contain secrets or customer data).
3. newTab + restoreState now consume validateNavigationUrl's normalized
return value. file://./x, file://~/x, and bare-segment forms now take
effect at every navigation site, not just the top-level goto command.
Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.
References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.
* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch
Wires the new handlers and dispatch logic that the previous commits made
possible:
write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
actionable errors (not found, directory, oversize), extension allowlist
(.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
<div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
override, frame-context rejection. Calls session.setTabContent() so replay
metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
scale-only variant reads current size via page.viewportSize(). Invalid
scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
(not after) so a timed-out goto post-commit doesn't leave stale metadata
that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.
meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
Rejects alongside positional selector (both = error), preserves --clip
conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
watch blocking, and result labels all use canonical names while audit
labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.
server.ts
- Command canonicalization inserted immediately after parse, before scope /
watch / tab-ownership / content-wrapping checks. rawCommand preserved for
future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
upgrade hint for NEW_IN_VERSION entries.
security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
'for (const c of commands)' to match the new chain step shape. Same
isWatching + BLOCKED invariants still asserted.
* chore: bump version and changelog (v1.1.0.0)
- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
viewport --scale, file:// support, setContent replay, and DX polish in user
voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
"Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
to reflect the new command descriptions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review fixes (9 findings from specialist + adversarial review)
Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:
1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
await. Prior order left phantom HTML in replay metadata if setContent
threw (timeout, browser crash), which a later viewport --scale would
silently replay. Now loadedHtml is only recorded on successful load.
2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
recreateContext after restoring the old fields. The fallback path in
the original recreateContext builds a blank context using whatever
this.deviceScaleFactor/currentViewport hold at that moment (which were
the NEW values we were trying to apply). Rolling back the fields without
a second recreate left the live context at new-scale while state tracked
old-scale. Now: restore fields, force re-recreate with old values, only
if that ALSO fails do we return a combined error.
3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
alphabetically, so first equal-distance wins by default. The prior
'(d === bestDist && best !== undefined && cand < best)' clause was dead
code.
4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
refs + frame. Without this, a user who load-html'd then clicked a link
(or had a form submit / JS redirect / OAuth flow) would retain the stale
replay metadata. The next viewport --scale would silently revert the
tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
content was. Silent data corruption. Browser-emitted navigations trigger
this path via wirePageEvents.
5. browser-manager.ts:saveState + restoreState — tab ownership now flows
through BrowserState.owner. Without this, a scoped agent's viewport
--scale would strand them: tab IDs change during recreate, ownership
map held stale IDs, owner lookup failed. New IDs had no owner, so
writes without tabId were denied (DoS). Worse, if the agent sent a
stale tabId the server's swallowed-tab-switch-error path would let the
command hit whatever tab was currently active (cross-tab authz bypass).
Now: clear ownership before restore, re-add per-tab with new IDs.
6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
allowlist (url, isActive, storage:null) instead of object spread.
Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
user-writable state file, letting a tampered state.json smuggle HTML
past load-html's safe-dirs / extension / magic-byte / 50MB-cap
validators, or forge tab ownership. Now stripped at the boundary.
7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
across normalization. file://./app.html?route=home#login previously
resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
%23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
and fixture URLs with query params 404'd or loaded the wrong route.
Now: split on ?/# before path resolution, reattach after.
8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
parsed.hash to the normalized file:// URL. Same fix at the main
validator for absolute paths that go through fileURLToPath round-trip.
9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
'setcontent'). Previously the isAliased variable was computed but
dropped, losing the raw input from the forensic trail. Completes the
plan's codex v3 P2 requirement.
Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).
Tests pass: exit 0, no failures. Build clean.
* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases
Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).
**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected
**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage
**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws
**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
resurrect the stale HTML (validates the onMainFrameNavigated fix)
**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch
Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).
Full suite: exit 0, 223/223 pass.
* docs: update BROWSER.md + CHANGELOG for v1.1.0.0
BROWSER.md:
- Command reference table updated: goto now lists file:// support,
load-html added to Navigate row, viewport flagged with --scale
option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
--selector flag) and notes the tag-selector-not-caught-positionally
gotcha
- New "Retina screenshots — viewport --scale" subsection explains
deviceScaleFactor mechanics, context recreation side effects, and
headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
explains the two paths, their tradeoffs (URL state, relative asset
resolution), the safe-dirs policy, extension allowlist + magic-byte
sniff, 50MB cap, setContent replay across recreateContext, and the
alias routing (setcontent → load-html before scope check)
CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
JS redirects) — not just explicit goto. Also notes SPA query/fragment
preservation for goto file://
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
222 lines
7.6 KiB
Cheetah
222 lines
7.6 KiB
Cheetah
---
|
|
name: browse
|
|
preamble-tier: 1
|
|
version: 1.1.0
|
|
description: |
|
|
Fast headless browser for QA testing and site dogfooding. Navigate any URL, interact with
|
|
elements, verify page state, diff before/after actions, take annotated screenshots, check
|
|
responsive layouts, test forms and uploads, handle dialogs, and assert element states.
|
|
~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a
|
|
user flow, or file a bug with evidence. Use when asked to "open in browser", "test the
|
|
site", "take a screenshot", or "dogfood this". (gstack)
|
|
triggers:
|
|
- browse a page
|
|
- headless browser
|
|
- take page screenshot
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- AskUserQuestion
|
|
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
# browse: QA Testing & Dogfooding
|
|
|
|
Persistent headless Chromium. First call auto-starts (~3s), then ~100ms per command.
|
|
State persists between calls (cookies, tabs, login sessions).
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
## Core QA Patterns
|
|
|
|
### 1. Verify a page loads correctly
|
|
```bash
|
|
$B goto https://yourapp.com
|
|
$B text # content loads?
|
|
$B console # JS errors?
|
|
$B network # failed requests?
|
|
$B is visible ".main-content" # key elements present?
|
|
```
|
|
|
|
### 2. Test a user flow
|
|
```bash
|
|
$B goto https://app.com/login
|
|
$B snapshot -i # see all interactive elements
|
|
$B fill @e3 "user@test.com"
|
|
$B fill @e4 "password"
|
|
$B click @e5 # submit
|
|
$B snapshot -D # diff: what changed after submit?
|
|
$B is visible ".dashboard" # success state present?
|
|
```
|
|
|
|
### 3. Verify an action worked
|
|
```bash
|
|
$B snapshot # baseline
|
|
$B click @e3 # do something
|
|
$B snapshot -D # unified diff shows exactly what changed
|
|
```
|
|
|
|
### 4. Visual evidence for bug reports
|
|
```bash
|
|
$B snapshot -i -a -o /tmp/annotated.png # labeled screenshot
|
|
$B screenshot /tmp/bug.png # plain screenshot
|
|
$B console # error log
|
|
```
|
|
|
|
### 5. Find all clickable elements (including non-ARIA)
|
|
```bash
|
|
$B snapshot -C # finds divs with cursor:pointer, onclick, tabindex
|
|
$B click @c1 # interact with them
|
|
```
|
|
|
|
### 6. Assert element states
|
|
```bash
|
|
$B is visible ".modal"
|
|
$B is enabled "#submit-btn"
|
|
$B is disabled "#submit-btn"
|
|
$B is checked "#agree-checkbox"
|
|
$B is editable "#name-field"
|
|
$B is focused "#search-input"
|
|
$B js "document.body.textContent.includes('Success')"
|
|
```
|
|
|
|
### 7. Test responsive layouts
|
|
```bash
|
|
$B responsive /tmp/layout # mobile + tablet + desktop screenshots
|
|
$B viewport 375x812 # or set specific viewport
|
|
$B screenshot /tmp/mobile.png
|
|
```
|
|
|
|
### 8. Test file uploads
|
|
```bash
|
|
$B upload "#file-input" /path/to/file.pdf
|
|
$B is visible ".upload-success"
|
|
```
|
|
|
|
### 9. Test dialogs
|
|
```bash
|
|
$B dialog-accept "yes" # set up handler
|
|
$B click "#delete-button" # trigger dialog
|
|
$B dialog # see what appeared
|
|
$B snapshot -D # verify deletion happened
|
|
```
|
|
|
|
### 10. Compare environments
|
|
```bash
|
|
$B diff https://staging.app.com https://prod.app.com
|
|
```
|
|
|
|
### 11. Show screenshots to the user
|
|
After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.
|
|
|
|
### 12. Render local HTML (no HTTP server needed)
|
|
Two paths, pick the cleaner one:
|
|
```bash
|
|
# HTML file on disk → goto file:// (absolute, or cwd-relative)
|
|
$B goto file:///tmp/report.html
|
|
$B goto file://./docs/page.html # cwd-relative
|
|
$B goto file://~/Documents/page.html # home-relative
|
|
|
|
# HTML generated in memory → load-html reads the file into setContent
|
|
echo '<div class="tweet">hello</div>' > /tmp/tweet.html
|
|
$B load-html /tmp/tweet.html
|
|
```
|
|
|
|
`goto file://...` is usually cleaner (URL is saved in state, relative asset URLs resolve against the file's dir, scale changes replay naturally). `load-html` uses `page.setContent()` — URL stays `about:blank`, but the content survives `viewport --scale` via in-memory replay. Both are scoped to files under cwd or `$TMPDIR`.
|
|
|
|
### 13. Retina screenshots (deviceScaleFactor)
|
|
```bash
|
|
$B viewport 480x600 --scale 2 # 2x deviceScaleFactor
|
|
$B load-html /tmp/tweet.html # or: $B goto file://./tweet.html
|
|
$B screenshot /tmp/out.png --selector .tweet-card
|
|
# → /tmp/out.png is 2x the pixel dimensions of the element
|
|
```
|
|
Scale must be 1-3 (gstack policy cap). Changing `--scale` recreates the browser context; refs from `snapshot` are invalidated (rerun `snapshot`), but `load-html` content is replayed automatically. Not supported in headed mode.
|
|
|
|
## Puppeteer → browse cheatsheet
|
|
|
|
Migrating from Puppeteer? Here's the 1:1 mapping for the core workflow:
|
|
|
|
| Puppeteer | browse |
|
|
|---|---|
|
|
| `await page.goto(url)` | `$B goto <url>` |
|
|
| `await page.setContent(html)` | `$B load-html <file>` (or `$B goto file://<abs>`) |
|
|
| `await page.setViewport({width, height})` | `$B viewport WxH` |
|
|
| `await page.setViewport({width, height, deviceScaleFactor: 2})` | `$B viewport WxH --scale 2` |
|
|
| `await (await page.$('.x')).screenshot({path})` | `$B screenshot <path> --selector .x` |
|
|
| `await page.screenshot({fullPage: true, path})` | `$B screenshot <path>` (full page default) |
|
|
| `await page.screenshot({clip: {x, y, w, h}, path})` | `$B screenshot <path> --clip x,y,w,h` |
|
|
|
|
Worked example (the tweet-renderer flow — Puppeteer → browse):
|
|
|
|
```bash
|
|
# Generate HTML in memory, render at 2x scale, screenshot the tweet card.
|
|
echo '<div class="tweet-card" style="width:400px;height:200px;background:#1da1f2;color:white;padding:20px">hello</div>' > /tmp/tweet.html
|
|
$B viewport 480x600 --scale 2
|
|
$B load-html /tmp/tweet.html
|
|
$B screenshot /tmp/out.png --selector .tweet-card
|
|
# /tmp/out.png is 800x400 px, crisp (2x deviceScaleFactor).
|
|
```
|
|
|
|
Aliases: typing `setcontent` or `set-content` routes to `load-html` automatically. Typing a typo (`load-htm`) returns `Did you mean 'load-html'?`.
|
|
|
|
## User Handoff
|
|
|
|
When you hit something you can't handle in headless mode (CAPTCHA, complex auth, multi-factor
|
|
login), hand off to the user:
|
|
|
|
```bash
|
|
# 1. Open a visible Chrome at the current page
|
|
$B handoff "Stuck on CAPTCHA at login page"
|
|
|
|
# 2. Tell the user what happened (via AskUserQuestion)
|
|
# "I've opened Chrome at the login page. Please solve the CAPTCHA
|
|
# and let me know when you're done."
|
|
|
|
# 3. When user says "done", re-snapshot and continue
|
|
$B resume
|
|
```
|
|
|
|
**When to use handoff:**
|
|
- CAPTCHAs or bot detection
|
|
- Multi-factor authentication (SMS, authenticator app)
|
|
- OAuth flows that require user interaction
|
|
- Complex interactions the AI can't handle after 3 attempts
|
|
|
|
The browser preserves all state (cookies, localStorage, tabs) across the handoff.
|
|
After `resume`, you get a fresh snapshot of wherever the user left off.
|
|
|
|
## Snapshot Flags
|
|
|
|
{{SNAPSHOT_FLAGS}}
|
|
|
|
## CSS Inspector & Style Modification
|
|
|
|
### Inspect element CSS
|
|
```bash
|
|
$B inspect .header # full CSS cascade for selector
|
|
$B inspect # latest picked element from sidebar
|
|
$B inspect --all # include user-agent stylesheet rules
|
|
$B inspect --history # show modification history
|
|
```
|
|
|
|
### Modify styles live
|
|
```bash
|
|
$B style .header background-color #1a1a1a # modify CSS property
|
|
$B style --undo # revert last change
|
|
$B style --undo 2 # revert specific change
|
|
```
|
|
|
|
### Clean screenshots
|
|
```bash
|
|
$B cleanup --all # remove ads, cookies, sticky, social
|
|
$B cleanup --ads --cookies # selective cleanup
|
|
$B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero.png
|
|
```
|
|
|
|
## Full Command List
|
|
|
|
{{COMMAND_REFERENCE}}
|