From b73f364411fe933a2e9d70c1a469621be92a9a34 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 8 Apr 2026 00:41:55 -0700 Subject: [PATCH 1/2] feat: browser data platform for AI agents (v0.16.0.0) (#907) * refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 18 ++ SKILL.md | 9 + VERSION | 2 +- browse/SKILL.md | 9 + browse/src/cli.ts | 11 +- browse/src/commands.ts | 9 + browse/src/media-extract.ts | 177 +++++++++++++++ browse/src/meta-commands.ts | 93 ++++---- browse/src/network-capture.ts | 179 ++++++++++++++++ browse/src/path-security.ts | 103 +++++++++ browse/src/read-commands.ts | 151 ++++++++++--- browse/src/server.ts | 64 +++++- browse/src/token-registry.ts | 16 +- browse/src/write-commands.ts | 296 ++++++++++++++++++++++---- browse/test/data-platform.test.ts | 176 +++++++++++++++ browse/test/fixtures/media-page.html | 67 ++++++ browse/test/security-audit-r2.test.ts | 26 +-- browse/test/tab-isolation.test.ts | 8 +- browse/test/token-registry.test.ts | 14 +- scripts/resolvers/browse.ts | 2 +- 20 files changed, 1264 insertions(+), 166 deletions(-) create mode 100644 browse/src/media-extract.ts create mode 100644 browse/src/network-capture.ts create mode 100644 browse/src/path-security.ts create mode 100644 browse/test/data-platform.test.ts create mode 100644 browse/test/fixtures/media-page.html diff --git a/CHANGELOG.md b/CHANGELOG.md index 137b1462..a4bb4c62 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,23 @@ # Changelog +## [0.16.0.0] - 2026-04-07 + +### Added +- **Browser data platform.** Six new browse commands that turn gstack browser from "a thing that clicks buttons" into a full scraping and data extraction tool for AI agents. +- `media` command: discover every image, video, and audio element on a page. Returns URLs, dimensions, srcset, lazy-load state, and detects HLS/DASH streams. Filter with `--images`, `--videos`, `--audio`, or scope with a CSS selector. +- `data` command: extract structured data embedded in pages. JSON-LD (product prices, recipes, events), Open Graph, Twitter Cards, and meta tags. One command gives you what used to take 50 lines of DOM scraping. +- `download` command: fetch any URL or `@ref` element to disk using the browser's session cookies. Handles blob URLs via in-page base64 conversion. `--base64` flag returns inline data URI for remote agents. Detects HLS/DASH and tells you to use yt-dlp instead of silently failing. +- `scrape` command: bulk download all media from a page. Combines `media` discovery + `download` in a loop with URL deduplication, configurable limits, and a `manifest.json` for machine consumption. +- `archive` command: save complete pages as MHTML via CDP. One command, full page with all resources. +- `scroll --times N`: automated repeated scrolling for infinite feed content loading. Configurable delay between scrolls with `--wait`. +- `screenshot --base64`: return screenshots as inline data URIs instead of file paths. Eliminates the two-step screenshot-then-file-serve dance for remote agents. +- **Network response body capture.** `network --capture` intercepts API response bodies so agents get structured JSON instead of fragile DOM scraping. Filter by URL pattern (`--filter graphql`), export as JSONL (`--export`), view summary (`--bodies`). 50MB size-capped buffer with automatic eviction. +- `GET /file` endpoint: remote paired agents can now retrieve downloaded files (images, scraped media, screenshots) over HTTP. TEMP_DIR only to prevent project file exfiltration. Bearer token auth, MIME detection, zero-copy streaming via `Bun.file()`. + +### Changed +- Paired agents now get full access by default (read+write+admin+meta). The trust boundary is the pairing ceremony, not the scope. An agent that can click any button doesn't gain meaningful attack surface from also being able to run `js`. Browser-wide destructive commands (stop, restart, disconnect) moved to new `control` scope, still opt-in via `--control`. +- Path validation extracted to shared `path-security.ts` module. Was duplicated across three files with slightly different implementations. Now one source of truth with `validateOutputPath`, `validateReadPath`, and `validateTempPath`. + ## [0.15.16.0] - 2026-04-06 ### Added diff --git a/SKILL.md b/SKILL.md index 3d951a67..94ba826b 100644 --- a/SKILL.md +++ b/SKILL.md @@ -773,11 +773,20 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`. | Command | Description | |---------|-------------| | `accessibility` | Full ARIA tree | +| `data [--jsonld|--og|--meta|--twitter]` | Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags | | `forms` | Form fields as JSON | | `html [selector]` | innerHTML of selector (throws if not found), or full page HTML if no selector given | | `links` | All links as "text → href" | +| `media [--images|--videos|--audio] [selector]` | All media elements (images, videos, audio) with URLs, dimensions, types | | `text` | Cleaned page text | +### Extraction +| Command | Description | +|---------|-------------| +| `archive [path]` | Save complete page as MHTML via CDP | +| `download [path] [--base64]` | Download URL or media element to disk using browser cookies | +| `scrape [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json | + ### Interaction | Command | Description | |---------|-------------| diff --git a/VERSION b/VERSION index 006a1444..70d644c0 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.15.16.0 +0.16.0.0 diff --git a/browse/SKILL.md b/browse/SKILL.md index 5bc9b02b..420e2b0b 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -665,11 +665,20 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero | Command | Description | |---------|-------------| | `accessibility` | Full ARIA tree | +| `data [--jsonld|--og|--meta|--twitter]` | Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags | | `forms` | Form fields as JSON | | `html [selector]` | innerHTML of selector (throws if not found), or full page HTML if no selector given | | `links` | All links as "text → href" | +| `media [--images|--videos|--audio] [selector]` | All media elements (images, videos, audio) with URLs, dimensions, types | | `text` | Cleaned page text | +### Extraction +| Command | Description | +|---------|-------------| +| `archive [path]` | Save complete page as MHTML via CDP | +| `download [path] [--base64]` | Download URL or media element to disk using browser cookies | +| `scrape [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json | + ### Interaction | Command | Description | |---------|-------------| diff --git a/browse/src/cli.ts b/browse/src/cli.ts index bbd5c733..0f6210a2 100644 --- a/browse/src/cli.ts +++ b/browse/src/cli.ts @@ -566,7 +566,7 @@ COMMAND REFERENCE: New tab: {"command": "newtab", "args": ["URL"]} SCOPES: ${scopeDesc}. -${scopes.includes('admin') ? '' : `To get admin access (JS, cookies, storage), ask the user to re-pair with --admin.\n`} +${scopes.includes('control') ? '' : `To get browser control access (stop, restart, disconnect), ask the user to re-pair with --control.\n`} TOKEN: Expires ${expiresAt}. Revoke: ask the user to run $B tunnel revoke @@ -591,10 +591,13 @@ function hasFlag(args: string[], flag: string): boolean { async function handlePairAgent(state: ServerState, args: string[]): Promise { const clientName = parseFlag(args, '--client') || `remote-${Date.now()}`; const domains = parseFlag(args, '--domain')?.split(',').map(d => d.trim()); - const admin = hasFlag(args, '--admin'); + const control = hasFlag(args, '--control') || hasFlag(args, '--admin'); + const restrict = parseFlag(args, '--restrict'); const localHost = parseFlag(args, '--local'); // Call POST /pair to create a setup key + // Default: full access (read+write+admin+meta). --control adds browser-wide ops. + // --restrict limits: --restrict read (read-only), --restrict "read,write" (no admin) const pairResp = await fetch(`http://127.0.0.1:${state.port}/pair`, { method: 'POST', headers: { @@ -603,9 +606,9 @@ async function handlePairAgent(state: ServerState, args: string[]): Promise s.trim()) } : {}), }), signal: AbortSignal.timeout(5000), }); diff --git a/browse/src/commands.ts b/browse/src/commands.ts index ceb089f3..eacdf0cd 100644 --- a/browse/src/commands.ts +++ b/browse/src/commands.ts @@ -16,6 +16,7 @@ export const READ_COMMANDS = new Set([ 'console', 'network', 'cookies', 'storage', 'perf', 'dialog', 'is', 'inspect', + 'media', 'data', ]); export const WRITE_COMMANDS = new Set([ @@ -24,6 +25,7 @@ export const WRITE_COMMANDS = new Set([ 'viewport', 'cookie', 'cookie-import', 'cookie-import-browser', 'header', 'useragent', 'upload', 'dialog-accept', 'dialog-dismiss', 'style', 'cleanup', 'prettyscreenshot', + 'download', 'scrape', 'archive', ]); export const META_COMMANDS = new Set([ @@ -46,6 +48,7 @@ export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...MET export const PAGE_CONTENT_COMMANDS = new Set([ 'text', 'html', 'links', 'forms', 'accessibility', 'attrs', 'console', 'dialog', + 'media', 'data', ]); /** Wrap output from untrusted-content commands with trust boundary markers */ @@ -70,6 +73,8 @@ export const COMMAND_DESCRIPTIONS: Record' }, 'eval': { category: 'Inspection', description: 'Run JavaScript from file and return result as string (path must be under /tmp or cwd)', usage: 'eval ' }, @@ -100,6 +105,10 @@ export const COMMAND_DESCRIPTIONS: Record' }, 'dialog-accept': { category: 'Interaction', description: 'Auto-accept next alert/confirm/prompt. Optional text is sent as the prompt response', usage: 'dialog-accept [text]' }, 'dialog-dismiss': { category: 'Interaction', description: 'Auto-dismiss next dialog' }, + // Data extraction + 'download': { category: 'Extraction', description: 'Download URL or media element to disk using browser cookies', usage: 'download [path] [--base64]' }, + 'scrape': { category: 'Extraction', description: 'Bulk download all media from page. Writes manifest.json', usage: 'scrape [--selector sel] [--dir path] [--limit N]' }, + 'archive': { category: 'Extraction', description: 'Save complete page as MHTML via CDP', usage: 'archive [path]' }, // Visual 'screenshot': { category: 'Visual', description: 'Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport)', usage: 'screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]' }, 'pdf': { category: 'Visual', description: 'Save as PDF', usage: 'pdf [path]' }, diff --git a/browse/src/media-extract.ts b/browse/src/media-extract.ts new file mode 100644 index 00000000..4ff9b252 --- /dev/null +++ b/browse/src/media-extract.ts @@ -0,0 +1,177 @@ +/** + * Media extraction helper — shared between `media` (read) and `scrape` (write) commands. + * + * Runs page.evaluate() to discover all media elements on the page: + * - with src, srcset, currentSrc, alt, dimensions, loading, data-src + * -