chore: merge main, re-bump version (v0.16.1.0)

Main advanced to v0.16.0.0 (browser data platform). Re-bumped our
security fix on top as v0.16.1.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-08 07:02:33 -10:00
20 changed files with 1265 additions and 167 deletions
+19 -1
View File
@@ -1,10 +1,28 @@
# Changelog
## [0.15.17.0] - 2026-04-07
## [0.16.1.0] - 2026-04-08
### Fixed
- Cookie picker no longer leaks the browse server auth token. Previously, opening the cookie picker page exposed the master bearer token in the HTML source, letting any local process extract it and execute arbitrary JavaScript in your browser session. Now uses a one-time code exchange with an HttpOnly session cookie. The token never appears in HTML, URLs, or browser history. (Reported by Horoshi at Vagabond Research, CVSS 7.8)
## [0.16.0.0] - 2026-04-07
### Added
- **Browser data platform.** Six new browse commands that turn gstack browser from "a thing that clicks buttons" into a full scraping and data extraction tool for AI agents.
- `media` command: discover every image, video, and audio element on a page. Returns URLs, dimensions, srcset, lazy-load state, and detects HLS/DASH streams. Filter with `--images`, `--videos`, `--audio`, or scope with a CSS selector.
- `data` command: extract structured data embedded in pages. JSON-LD (product prices, recipes, events), Open Graph, Twitter Cards, and meta tags. One command gives you what used to take 50 lines of DOM scraping.
- `download` command: fetch any URL or `@ref` element to disk using the browser's session cookies. Handles blob URLs via in-page base64 conversion. `--base64` flag returns inline data URI for remote agents. Detects HLS/DASH and tells you to use yt-dlp instead of silently failing.
- `scrape` command: bulk download all media from a page. Combines `media` discovery + `download` in a loop with URL deduplication, configurable limits, and a `manifest.json` for machine consumption.
- `archive` command: save complete pages as MHTML via CDP. One command, full page with all resources.
- `scroll --times N`: automated repeated scrolling for infinite feed content loading. Configurable delay between scrolls with `--wait`.
- `screenshot --base64`: return screenshots as inline data URIs instead of file paths. Eliminates the two-step screenshot-then-file-serve dance for remote agents.
- **Network response body capture.** `network --capture` intercepts API response bodies so agents get structured JSON instead of fragile DOM scraping. Filter by URL pattern (`--filter graphql`), export as JSONL (`--export`), view summary (`--bodies`). 50MB size-capped buffer with automatic eviction.
- `GET /file` endpoint: remote paired agents can now retrieve downloaded files (images, scraped media, screenshots) over HTTP. TEMP_DIR only to prevent project file exfiltration. Bearer token auth, MIME detection, zero-copy streaming via `Bun.file()`.
### Changed
- Paired agents now get full access by default (read+write+admin+meta). The trust boundary is the pairing ceremony, not the scope. An agent that can click any button doesn't gain meaningful attack surface from also being able to run `js`. Browser-wide destructive commands (stop, restart, disconnect) moved to new `control` scope, still opt-in via `--control`.
- Path validation extracted to shared `path-security.ts` module. Was duplicated across three files with slightly different implementations. Now one source of truth with `validateOutputPath`, `validateReadPath`, and `validateTempPath`.
## [0.15.16.0] - 2026-04-06
### Added
+9
View File
@@ -773,11 +773,20 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
| Command | Description |
|---------|-------------|
| `accessibility` | Full ARIA tree |
| `data [--jsonld|--og|--meta|--twitter]` | Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags |
| `forms` | Form fields as JSON |
| `html [selector]` | innerHTML of selector (throws if not found), or full page HTML if no selector given |
| `links` | All links as "text → href" |
| `media [--images|--videos|--audio] [selector]` | All media elements (images, videos, audio) with URLs, dimensions, types |
| `text` | Cleaned page text |
### Extraction
| Command | Description |
|---------|-------------|
| `archive [path]` | Save complete page as MHTML via CDP |
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
### Interaction
| Command | Description |
|---------|-------------|
+1 -1
View File
@@ -1 +1 @@
0.15.17.0
0.16.1.0
+9
View File
@@ -665,11 +665,20 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
| Command | Description |
|---------|-------------|
| `accessibility` | Full ARIA tree |
| `data [--jsonld|--og|--meta|--twitter]` | Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags |
| `forms` | Form fields as JSON |
| `html [selector]` | innerHTML of selector (throws if not found), or full page HTML if no selector given |
| `links` | All links as "text → href" |
| `media [--images|--videos|--audio] [selector]` | All media elements (images, videos, audio) with URLs, dimensions, types |
| `text` | Cleaned page text |
### Extraction
| Command | Description |
|---------|-------------|
| `archive [path]` | Save complete page as MHTML via CDP |
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
### Interaction
| Command | Description |
|---------|-------------|
+7 -4
View File
@@ -566,7 +566,7 @@ COMMAND REFERENCE:
New tab: {"command": "newtab", "args": ["URL"]}
SCOPES: ${scopeDesc}.
${scopes.includes('admin') ? '' : `To get admin access (JS, cookies, storage), ask the user to re-pair with --admin.\n`}
${scopes.includes('control') ? '' : `To get browser control access (stop, restart, disconnect), ask the user to re-pair with --control.\n`}
TOKEN: Expires ${expiresAt}. Revoke: ask the user to run
$B tunnel revoke <your-name>
@@ -591,10 +591,13 @@ function hasFlag(args: string[], flag: string): boolean {
async function handlePairAgent(state: ServerState, args: string[]): Promise<void> {
const clientName = parseFlag(args, '--client') || `remote-${Date.now()}`;
const domains = parseFlag(args, '--domain')?.split(',').map(d => d.trim());
const admin = hasFlag(args, '--admin');
const control = hasFlag(args, '--control') || hasFlag(args, '--admin');
const restrict = parseFlag(args, '--restrict');
const localHost = parseFlag(args, '--local');
// Call POST /pair to create a setup key
// Default: full access (read+write+admin+meta). --control adds browser-wide ops.
// --restrict limits: --restrict read (read-only), --restrict "read,write" (no admin)
const pairResp = await fetch(`http://127.0.0.1:${state.port}/pair`, {
method: 'POST',
headers: {
@@ -603,9 +606,9 @@ async function handlePairAgent(state: ServerState, args: string[]): Promise<void
},
body: JSON.stringify({
domains,
clientId: clientName,
admin,
control,
...(restrict ? { scopes: restrict.split(',').map(s => s.trim()) } : {}),
}),
signal: AbortSignal.timeout(5000),
});
+9
View File
@@ -16,6 +16,7 @@ export const READ_COMMANDS = new Set([
'console', 'network', 'cookies', 'storage', 'perf',
'dialog', 'is',
'inspect',
'media', 'data',
]);
export const WRITE_COMMANDS = new Set([
@@ -24,6 +25,7 @@ export const WRITE_COMMANDS = new Set([
'viewport', 'cookie', 'cookie-import', 'cookie-import-browser', 'header', 'useragent',
'upload', 'dialog-accept', 'dialog-dismiss',
'style', 'cleanup', 'prettyscreenshot',
'download', 'scrape', 'archive',
]);
export const META_COMMANDS = new Set([
@@ -46,6 +48,7 @@ export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...MET
export const PAGE_CONTENT_COMMANDS = new Set([
'text', 'html', 'links', 'forms', 'accessibility', 'attrs',
'console', 'dialog',
'media', 'data',
]);
/** Wrap output from untrusted-content commands with trust boundary markers */
@@ -70,6 +73,8 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
'links': { category: 'Reading', description: 'All links as "text → href"' },
'forms': { category: 'Reading', description: 'Form fields as JSON' },
'accessibility': { category: 'Reading', description: 'Full ARIA tree' },
'media': { category: 'Reading', description: 'All media elements (images, videos, audio) with URLs, dimensions, types', usage: 'media [--images|--videos|--audio] [selector]' },
'data': { category: 'Reading', description: 'Structured data: JSON-LD, Open Graph, Twitter Cards, meta tags', usage: 'data [--jsonld|--og|--meta|--twitter]' },
// Inspection
'js': { category: 'Inspection', description: 'Run JavaScript expression and return result as string', usage: 'js <expr>' },
'eval': { category: 'Inspection', description: 'Run JavaScript from file and return result as string (path must be under /tmp or cwd)', usage: 'eval <file>' },
@@ -100,6 +105,10 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
'useragent': { category: 'Interaction', description: 'Set user agent', usage: 'useragent <string>' },
'dialog-accept': { category: 'Interaction', description: 'Auto-accept next alert/confirm/prompt. Optional text is sent as the prompt response', usage: 'dialog-accept [text]' },
'dialog-dismiss': { category: 'Interaction', description: 'Auto-dismiss next dialog' },
// Data extraction
'download': { category: 'Extraction', description: 'Download URL or media element to disk using browser cookies', usage: 'download <url|@ref> [path] [--base64]' },
'scrape': { category: 'Extraction', description: 'Bulk download all media from page. Writes manifest.json', usage: 'scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]' },
'archive': { category: 'Extraction', description: 'Save complete page as MHTML via CDP', usage: 'archive [path]' },
// Visual
'screenshot': { category: 'Visual', description: 'Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport)', usage: 'screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]' },
'pdf': { category: 'Visual', description: 'Save as PDF', usage: 'pdf [path]' },
+177
View File
@@ -0,0 +1,177 @@
/**
* Media extraction helper shared between `media` (read) and `scrape` (write) commands.
*
* Runs page.evaluate() to discover all media elements on the page:
* - <img> with src, srcset, currentSrc, alt, dimensions, loading, data-src
* - <video> with currentSrc, poster, duration, <source> children, HLS/DASH detection
* - <audio> with src, duration, type
* - CSS background-image (capped at 500 elements)
*/
import type { Page, Frame } from 'playwright';
export interface ImageInfo {
index: number;
src: string;
srcset: string;
currentSrc: string;
alt: string;
width: number;
height: number;
naturalWidth: number;
naturalHeight: number;
loading: string;
dataSrc: string;
visible: boolean;
}
export interface VideoSource {
src: string;
type: string;
}
export interface VideoInfo {
index: number;
src: string;
currentSrc: string;
poster: string;
width: number;
height: number;
duration: number;
type: string;
sources: VideoSource[];
isHLS: boolean;
isDASH: boolean;
}
export interface AudioInfo {
index: number;
src: string;
currentSrc: string;
duration: number;
type: string;
}
export interface BackgroundImageInfo {
index: number;
url: string;
selector: string;
element: string;
}
export interface MediaResult {
images: ImageInfo[];
videos: VideoInfo[];
audio: AudioInfo[];
backgroundImages: BackgroundImageInfo[];
total: number;
}
/** Extract all media elements from the page or a scoped subtree. */
export async function extractMedia(
target: Page | Frame,
options?: { selector?: string; filter?: 'images' | 'videos' | 'audio' },
): Promise<MediaResult> {
const result = await target.evaluate(({ scopeSelector, filter }) => {
const root = scopeSelector
? document.querySelector(scopeSelector) || document
: document;
const images: any[] = [];
const videos: any[] = [];
const audio: any[] = [];
const backgroundImages: any[] = [];
// Images
if (!filter || filter === 'images') {
const imgs = root.querySelectorAll('img');
imgs.forEach((img, i) => {
const rect = img.getBoundingClientRect();
images.push({
index: i,
src: img.src || '',
srcset: img.srcset || '',
currentSrc: img.currentSrc || '',
alt: img.alt || '',
width: img.width,
height: img.height,
naturalWidth: img.naturalWidth,
naturalHeight: img.naturalHeight,
loading: img.loading || '',
dataSrc: img.getAttribute('data-src') || img.getAttribute('data-lazy-src') || img.getAttribute('data-original') || '',
visible: rect.width > 0 && rect.height > 0 && rect.bottom > 0 && rect.right > 0,
});
});
}
// Videos
if (!filter || filter === 'videos') {
const vids = root.querySelectorAll('video');
vids.forEach((vid, i) => {
const sources = Array.from(vid.querySelectorAll('source')).map(s => ({
src: s.src || '',
type: s.type || '',
}));
const isHLS = sources.some(s => s.type.includes('mpegURL') || s.src.includes('.m3u8'));
const isDASH = sources.some(s => s.type.includes('dash') || s.src.includes('.mpd'));
videos.push({
index: i,
src: vid.src || '',
currentSrc: vid.currentSrc || '',
poster: vid.poster || '',
width: vid.videoWidth || vid.width,
height: vid.videoHeight || vid.height,
duration: isFinite(vid.duration) ? vid.duration : 0,
type: sources[0]?.type || '',
sources,
isHLS,
isDASH,
});
});
}
// Audio
if (!filter || filter === 'audio') {
const auds = root.querySelectorAll('audio');
auds.forEach((aud, i) => {
const source = aud.querySelector('source');
audio.push({
index: i,
src: aud.src || source?.src || '',
currentSrc: aud.currentSrc || '',
duration: isFinite(aud.duration) ? aud.duration : 0,
type: source?.type || '',
});
});
}
// Background images (capped at 500 elements for performance)
if (!filter || filter === 'images') {
const allElements = root.querySelectorAll('*');
let bgCount = 0;
for (let i = 0; i < allElements.length && bgCount < 500; i++) {
const el = allElements[i];
const bg = getComputedStyle(el).backgroundImage;
if (bg && bg !== 'none') {
const urlMatch = bg.match(/url\(["']?([^"')]+)["']?\)/);
if (urlMatch && urlMatch[1] && !urlMatch[1].startsWith('data:')) {
backgroundImages.push({
index: bgCount,
url: urlMatch[1],
selector: el.tagName.toLowerCase() + (el.id ? `#${el.id}` : '') + (el.className && typeof el.className === 'string' ? '.' + el.className.trim().split(/\s+/).join('.') : ''),
element: el.tagName.toLowerCase(),
});
bgCount++;
}
}
}
}
return { images, videos, audio, backgroundImages };
}, { scopeSelector: options?.selector || null, filter: options?.filter || null });
return {
...result,
total: result.images.length + result.videos.length + result.audio.length + result.backgroundImages.length,
};
}
+41 -52
View File
@@ -8,48 +8,16 @@ import { getCleanText } from './read-commands';
import { READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS, PAGE_CONTENT_COMMANDS, wrapUntrustedContent } from './commands';
import { validateNavigationUrl } from './url-validation';
import { checkScope, type TokenInfo } from './token-registry';
import { validateOutputPath, escapeRegExp } from './path-security';
// Re-export for backward compatibility (tests import from meta-commands)
export { validateOutputPath, escapeRegExp } from './path-security';
import * as Diff from 'diff';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR, isPathWithin } from './platform';
import { TEMP_DIR } from './platform';
import { resolveConfig } from './config';
import type { Frame } from 'playwright';
// Security: Path validation to prevent path traversal attacks
// Resolve safe directories through realpathSync to handle symlinks (e.g., macOS /tmp → /private/tmp)
const SAFE_DIRECTORIES = [TEMP_DIR, process.cwd()].map(d => {
try { return fs.realpathSync(d); } catch { return d; }
});
export function validateOutputPath(filePath: string): void {
const resolved = path.resolve(filePath);
// Resolve real path of the parent directory to catch symlinks.
// The file itself may not exist yet (e.g., screenshot output).
let dir = path.dirname(resolved);
let realDir: string;
try {
realDir = fs.realpathSync(dir);
} catch {
try {
realDir = fs.realpathSync(path.dirname(dir));
} catch {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
const realResolved = path.join(realDir, path.basename(resolved));
const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realResolved, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
/** Escape special regex metacharacters in a user-supplied string to prevent ReDoS. */
export function escapeRegExp(s: string): string {
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
/** Tokenize a pipe segment respecting double-quoted strings. */
function tokenizePipeSegment(segment: string): string[] {
const tokens: string[] = [];
@@ -117,7 +85,7 @@ export async function handleMetaCommand(
// ─── Server Control ────────────────────────────────
case 'status': {
const page = session.getPage();
const page = bm.getPage();
const tabs = bm.getTabCount();
const mode = bm.getConnectionMode();
return [
@@ -147,17 +115,20 @@ export async function handleMetaCommand(
// ─── Visual ────────────────────────────────────────
case 'screenshot': {
// Parse priority: flags (--viewport, --clip) → selector (@ref, CSS) → output path
const page = session.getPage();
// Parse priority: flags (--viewport, --clip, --base64) → selector (@ref, CSS) → output path
const page = bm.getPage();
let outputPath = `${TEMP_DIR}/browse-screenshot.png`;
let clipRect: { x: number; y: number; width: number; height: number } | undefined;
let targetSelector: string | undefined;
let viewportOnly = false;
let base64Mode = false;
const remaining: string[] = [];
for (let i = 0; i < args.length; i++) {
if (args[i] === '--viewport') {
viewportOnly = true;
} else if (args[i] === '--base64') {
base64Mode = true;
} else if (args[i] === '--clip') {
const coords = args[++i];
if (!coords) throw new Error('Usage: screenshot --clip x,y,w,h [path]');
@@ -194,8 +165,26 @@ export async function handleMetaCommand(
throw new Error('Cannot use --viewport with --clip — choose one');
}
// --base64 mode: capture to buffer instead of disk
if (base64Mode) {
let buffer: Buffer;
if (targetSelector) {
const resolved = await bm.resolveRef(targetSelector);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
buffer = await locator.screenshot({ timeout: 5000 });
} else if (clipRect) {
buffer = await page.screenshot({ clip: clipRect });
} else {
buffer = await page.screenshot({ fullPage: !viewportOnly });
}
if (buffer.length > 10 * 1024 * 1024) {
throw new Error('Screenshot too large for --base64 (>10MB). Use disk path instead.');
}
return `data:image/png;base64,${buffer.toString('base64')}`;
}
if (targetSelector) {
const resolved = await session.resolveRef(targetSelector);
const resolved = await bm.resolveRef(targetSelector);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
await locator.screenshot({ path: outputPath, timeout: 5000 });
return `Screenshot saved (element): ${outputPath}`;
@@ -211,7 +200,7 @@ export async function handleMetaCommand(
}
case 'pdf': {
const page = session.getPage();
const page = bm.getPage();
const pdfPath = args[0] || `${TEMP_DIR}/browse-page.pdf`;
validateOutputPath(pdfPath);
await page.pdf({ path: pdfPath, format: 'A4' });
@@ -219,7 +208,7 @@ export async function handleMetaCommand(
}
case 'responsive': {
const page = session.getPage();
const page = bm.getPage();
const prefix = args[0] || `${TEMP_DIR}/browse-responsive`;
validateOutputPath(prefix);
const viewports = [
@@ -344,7 +333,7 @@ export async function handleMetaCommand(
// Wait for network to settle after write commands before returning
if (lastWasWrite) {
await session.getPage().waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
await bm.getPage().waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
}
return results.join('\n\n');
@@ -355,7 +344,7 @@ export async function handleMetaCommand(
const [url1, url2] = args;
if (!url1 || !url2) throw new Error('Usage: browse diff <url1> <url2>');
const page = session.getPage();
const page = bm.getPage();
await validateNavigationUrl(url1);
await page.goto(url1, { waitUntil: 'domcontentloaded', timeout: 15000 });
const text1 = await getCleanText(page);
@@ -454,7 +443,7 @@ export async function handleMetaCommand(
// If a ref was passed, scroll it into view
if (args.length > 0 && args[0].startsWith('@')) {
try {
const resolved = await session.resolveRef(args[0]);
const resolved = await bm.resolveRef(args[0]);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
return `Browser activated. Scrolled ${args[0]} into view.`;
@@ -611,7 +600,7 @@ export async function handleMetaCommand(
}
}
// Close existing pages, then restore (replace, not merge)
session.setFrame(null);
bm.setFrame(null);
await bm.closeAllPages();
await bm.restoreState({
cookies: validatedCookies,
@@ -629,12 +618,12 @@ export async function handleMetaCommand(
if (!target) throw new Error('Usage: frame <selector|@ref|--name name|--url pattern|main>');
if (target === 'main') {
session.setFrame(null);
session.clearRefs();
bm.setFrame(null);
bm.clearRefs();
return 'Switched to main frame';
}
const page = session.getPage();
const page = bm.getPage();
let frame: Frame | null = null;
if (target === '--name') {
@@ -645,7 +634,7 @@ export async function handleMetaCommand(
frame = page.frame({ url: new RegExp(escapeRegExp(args[1])) });
} else {
// CSS selector or @ref for the iframe element
const resolved = await session.resolveRef(target);
const resolved = await bm.resolveRef(target);
const locator = 'locator' in resolved ? resolved.locator : page.locator(resolved.selector);
const elementHandle = await locator.elementHandle({ timeout: 5000 });
frame = await elementHandle?.contentFrame() ?? null;
@@ -653,8 +642,8 @@ export async function handleMetaCommand(
}
if (!frame) throw new Error(`Frame not found: ${target}`);
session.setFrame(frame);
session.clearRefs();
bm.setFrame(frame);
bm.clearRefs();
return `Switched to frame: ${frame.url()}`;
}
+179
View File
@@ -0,0 +1,179 @@
/**
* Network response body capture SizeCappedBuffer + capture lifecycle.
*
* Architecture:
* page.on('response') listener filter by URL pattern store body
* SizeCappedBuffer: evicts oldest entries when total size exceeds cap
* Export: writes JSONL file (one response per line)
*
* Memory management:
* - 50MB total buffer cap (configurable)
* - 5MB per-entry body cap (larger responses stored as metadata only)
* - Binary responses stored as base64
* - Text responses stored as-is
*/
import * as fs from 'fs';
import type { Response as PlaywrightResponse } from 'playwright';
export interface CapturedResponse {
url: string;
status: number;
headers: Record<string, string>;
body: string;
contentType: string;
timestamp: number;
size: number;
bodyTruncated: boolean;
}
const MAX_BUFFER_SIZE = 50 * 1024 * 1024; // 50MB total
const MAX_ENTRY_SIZE = 5 * 1024 * 1024; // 5MB per response body
export class SizeCappedBuffer {
private entries: CapturedResponse[] = [];
private totalSize = 0;
private readonly maxSize: number;
constructor(maxSize = MAX_BUFFER_SIZE) {
this.maxSize = maxSize;
}
push(entry: CapturedResponse): void {
// Evict oldest entries until we have room
while (this.entries.length > 0 && this.totalSize + entry.size > this.maxSize) {
const evicted = this.entries.shift()!;
this.totalSize -= evicted.size;
}
this.entries.push(entry);
this.totalSize += entry.size;
}
toArray(): CapturedResponse[] {
return [...this.entries];
}
get length(): number {
return this.entries.length;
}
get byteSize(): number {
return this.totalSize;
}
clear(): void {
this.entries = [];
this.totalSize = 0;
}
/** Export to JSONL file. */
exportToFile(filePath: string): number {
const lines = this.entries.map(e => JSON.stringify(e));
fs.writeFileSync(filePath, lines.join('\n') + '\n');
return this.entries.length;
}
/** Summary of captured responses (URL, status, size). */
summary(): string {
if (this.entries.length === 0) return 'No captured responses.';
const lines = this.entries.map((e, i) =>
` [${i + 1}] ${e.status} ${e.url.slice(0, 100)} (${Math.round(e.size / 1024)}KB${e.bodyTruncated ? ', truncated' : ''})`
);
return `${this.entries.length} responses (${Math.round(this.totalSize / 1024)}KB total):\n${lines.join('\n')}`;
}
}
/** Global capture state. */
let captureBuffer = new SizeCappedBuffer();
let captureActive = false;
let captureFilter: RegExp | null = null;
let captureListener: ((response: PlaywrightResponse) => Promise<void>) | null = null;
export function isCaptureActive(): boolean {
return captureActive;
}
export function getCaptureBuffer(): SizeCappedBuffer {
return captureBuffer;
}
/** Create the response listener function. */
function createResponseListener(filter: RegExp | null): (response: PlaywrightResponse) => Promise<void> {
return async (response: PlaywrightResponse) => {
const url = response.url();
if (filter && !filter.test(url)) return;
// Skip non-content responses (redirects, 204, etc.)
const status = response.status();
if (status === 204 || status === 301 || status === 302 || status === 304) return;
const contentType = response.headers()['content-type'] || '';
let body = '';
let bodySize = 0;
let truncated = false;
try {
const rawBody = await response.body();
bodySize = rawBody.length;
if (bodySize > MAX_ENTRY_SIZE) {
truncated = true;
body = '';
} else if (contentType.includes('json') || contentType.includes('text') || contentType.includes('xml') || contentType.includes('html')) {
body = rawBody.toString('utf-8');
} else {
body = rawBody.toString('base64');
}
} catch {
// Response body may be unavailable (e.g., streaming, aborted)
body = '';
truncated = true;
}
const entry: CapturedResponse = {
url,
status,
headers: response.headers(),
body,
contentType,
timestamp: Date.now(),
size: bodySize,
bodyTruncated: truncated,
};
captureBuffer.push(entry);
};
}
/** Start capturing response bodies. */
export function startCapture(filterPattern?: string): { filter: string | null } {
captureFilter = filterPattern ? new RegExp(filterPattern) : null;
captureActive = true;
captureListener = createResponseListener(captureFilter);
return { filter: filterPattern || null };
}
/** Get the active listener (to attach to page). */
export function getCaptureListener(): ((response: PlaywrightResponse) => Promise<void>) | null {
return captureListener;
}
/** Stop capturing. */
export function stopCapture(): { count: number; sizeKB: number } {
captureActive = false;
captureListener = null;
return {
count: captureBuffer.length,
sizeKB: Math.round(captureBuffer.byteSize / 1024),
};
}
/** Clear the capture buffer. */
export function clearCapture(): void {
captureBuffer.clear();
}
/** Export captured responses to JSONL file. */
export function exportCapture(filePath: string): number {
return captureBuffer.exportToFile(filePath);
}
+103
View File
@@ -0,0 +1,103 @@
/**
* Shared path validation single source of truth for file path security.
*
* Previously duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts.
* All file I/O commands (screenshot, pdf, download, scrape, archive, eval) must
* validate paths through these functions.
*
* validateOutputPath(path) for writing files (screenshot, pdf, download, scrape, archive)
* validateReadPath(path) for reading files (eval)
* validateTempPath(path) for serving files to remote agents (GET /file, TEMP_DIR only)
*
* Security invariants:
* 1. All paths resolved to absolute before checking
* 2. Symlinks resolved to catch traversal via symlink inside safe dir
* 3. SAFE_DIRECTORIES = [TEMP_DIR, cwd] for local commands
* 4. TEMP_ONLY = [TEMP_DIR] for remote file serving (prevents project file exfil)
*/
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR, isPathWithin } from './platform';
// Resolve safe directories through realpathSync to handle symlinks (e.g., macOS /tmp → /private/tmp)
export const SAFE_DIRECTORIES = [TEMP_DIR, process.cwd()].map(d => {
try { return fs.realpathSync(d); } catch { return d; }
});
const TEMP_ONLY = [TEMP_DIR].map(d => {
try { return fs.realpathSync(d); } catch { return d; }
});
/** Validate a file path for writing (screenshot, pdf, download, scrape, archive). */
export function validateOutputPath(filePath: string): void {
const resolved = path.resolve(filePath);
// Resolve real path of the parent directory to catch symlinks.
// The file itself may not exist yet (e.g., screenshot output).
// This also handles macOS /tmp → /private/tmp transparently.
let dir = path.dirname(resolved);
let realDir: string;
try {
realDir = fs.realpathSync(dir);
} catch {
try {
realDir = fs.realpathSync(path.dirname(dir));
} catch {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
const realResolved = path.join(realDir, path.basename(resolved));
const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realResolved, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
/** Validate a file path for reading (eval command). */
export function validateReadPath(filePath: string): void {
const resolved = path.resolve(filePath);
let realPath: string;
try {
realPath = fs.realpathSync(resolved);
} catch (err: any) {
if (err.code === 'ENOENT') {
try {
const dir = fs.realpathSync(path.dirname(resolved));
realPath = path.join(dir, path.basename(resolved));
} catch {
realPath = resolved;
}
} else {
throw new Error(`Cannot resolve real path: ${filePath} (${err.code})`);
}
}
const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realPath, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
/** Validate a file path for remote serving (GET /file). TEMP_DIR only, not cwd. */
export function validateTempPath(filePath: string): void {
const resolved = path.resolve(filePath);
let realPath: string;
try {
realPath = fs.realpathSync(resolved);
} catch (err: any) {
if (err.code === 'ENOENT') {
throw new Error('File not found');
}
throw new Error(`Cannot resolve path: ${filePath}`);
}
const isSafe = TEMP_ONLY.some(dir => isPathWithin(realPath, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${TEMP_ONLY.join(', ')} (remote file serving is restricted to temp directory)`);
}
}
/** Escape special regex metacharacters in a user-supplied string to prevent ReDoS. */
export function escapeRegExp(s: string): string {
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
+118 -33
View File
@@ -10,8 +10,11 @@ import { consoleBuffer, networkBuffer, dialogBuffer } from './buffers';
import type { Page, Frame } from 'playwright';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR, isPathWithin } from './platform';
import { TEMP_DIR } from './platform';
import { inspectElement, formatInspectorResult, getModificationHistory } from './cdp-inspector';
import { validateReadPath } from './path-security';
// Re-export for backward compatibility (tests import from read-commands)
export { validateReadPath } from './path-security';
// Redaction patterns for sensitive cookie/storage values — exported for test coverage
export const SENSITIVE_COOKIE_NAME = /(^|[_.-])(token|secret|key|password|credential|auth|jwt|session|csrf|sid)($|[_.-])|api.?key/i;
@@ -41,38 +44,6 @@ function wrapForEvaluate(code: string): string {
: `(async()=>(${trimmed}))()`;
}
// Security: Path validation to prevent path traversal attacks
// Resolve safe directories through realpathSync to handle symlinks (e.g., macOS /tmp → /private/tmp)
const SAFE_DIRECTORIES = [TEMP_DIR, process.cwd()].map(d => {
try { return fs.realpathSync(d); } catch { return d; }
});
export function validateReadPath(filePath: string): void {
// Always resolve to absolute first (fixes relative path symlink bypass)
const resolved = path.resolve(filePath);
// Resolve symlinks — throw on non-ENOENT errors
let realPath: string;
try {
realPath = fs.realpathSync(resolved);
} catch (err: any) {
if (err.code === 'ENOENT') {
// File doesn't exist — resolve directory part for symlinks (e.g., /tmp → /private/tmp)
try {
const dir = fs.realpathSync(path.dirname(resolved));
realPath = path.join(dir, path.basename(resolved));
} catch {
realPath = resolved;
}
} else {
throw new Error(`Cannot resolve real path: ${filePath} (${err.code})`);
}
}
const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realPath, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
/**
* Extract clean text from a page (strips script/style/noscript/svg).
* Exported for DRY reuse in meta-commands (diff).
@@ -254,6 +225,50 @@ export async function handleReadCommand(
networkBuffer.clear();
return 'Network buffer cleared.';
}
// Network capture extensions
if (args[0] === '--capture') {
const {
startCapture, stopCapture, getCaptureListener, isCaptureActive,
} = await import('./network-capture');
if (args[1] === 'stop') {
// Detach listener from current page
const page = bm.getPage();
const listener = getCaptureListener();
if (listener) page.removeListener('response', listener);
const result = stopCapture();
return `Network capture stopped. ${result.count} responses captured (${result.sizeKB}KB).`;
}
// Start capture
if (isCaptureActive()) return 'Capture already active. Use --capture stop first.';
const filterIdx = args.indexOf('--filter');
const filterPattern = filterIdx >= 0 ? args[filterIdx + 1] : undefined;
const info = startCapture(filterPattern);
// Attach listener to current page
const page = bm.getPage();
const listener = getCaptureListener();
if (listener) page.on('response', listener);
return `Network capture started${info.filter ? ` (filter: ${info.filter})` : ''}. Use --capture stop to stop.`;
}
if (args[0] === '--export') {
const { exportCapture } = await import('./network-capture');
const { validateOutputPath: vop } = await import('./path-security');
const exportPath = args[1];
if (!exportPath) throw new Error('Usage: network --export <path>');
vop(exportPath);
const count = exportCapture(exportPath);
return `Exported ${count} captured responses to ${exportPath}`;
}
if (args[0] === '--bodies') {
const { getCaptureBuffer } = await import('./network-capture');
return getCaptureBuffer().summary();
}
// Default: show request metadata
if (networkBuffer.length === 0) return '(no network requests)';
return networkBuffer.toArray().map(e =>
`${e.method} ${e.url}${e.status || 'pending'} (${e.duration || '?'}ms, ${e.size || '?'}B)`
@@ -412,6 +427,76 @@ export async function handleReadCommand(
return formatInspectorResult(result, { includeUA });
}
case 'media': {
const { extractMedia } = await import('./media-extract');
const target = bm.getActiveFrameOrPage();
const filter = args.includes('--images') ? 'images' as const
: args.includes('--videos') ? 'videos' as const
: args.includes('--audio') ? 'audio' as const
: undefined;
const selectorArg = args.find(a => !a.startsWith('--'));
const result = await extractMedia(target, { selector: selectorArg, filter });
return JSON.stringify(result, null, 2);
}
case 'data': {
const target = bm.getActiveFrameOrPage();
const wantJsonLd = args.includes('--jsonld') || args.length === 0;
const wantOg = args.includes('--og') || args.length === 0;
const wantTwitter = args.includes('--twitter') || args.length === 0;
const wantMeta = args.includes('--meta') || args.length === 0;
const result = await target.evaluate(({ wantJsonLd, wantOg, wantTwitter, wantMeta }) => {
const data: Record<string, any> = {};
if (wantJsonLd) {
const scripts = document.querySelectorAll('script[type="application/ld+json"]');
const jsonLd: any[] = [];
scripts.forEach(s => {
try { jsonLd.push(JSON.parse(s.textContent || '')); } catch {}
});
data.jsonLd = jsonLd;
}
if (wantOg) {
const og: Record<string, string> = {};
document.querySelectorAll('meta[property^="og:"]').forEach(m => {
const prop = m.getAttribute('property')?.replace('og:', '') || '';
og[prop] = m.getAttribute('content') || '';
});
data.openGraph = og;
}
if (wantTwitter) {
const tw: Record<string, string> = {};
document.querySelectorAll('meta[name^="twitter:"]').forEach(m => {
const name = m.getAttribute('name')?.replace('twitter:', '') || '';
tw[name] = m.getAttribute('content') || '';
});
data.twitterCards = tw;
}
if (wantMeta) {
const meta: Record<string, string> = {};
const canonical = document.querySelector('link[rel="canonical"]');
if (canonical) meta.canonical = canonical.getAttribute('href') || '';
const desc = document.querySelector('meta[name="description"]');
if (desc) meta.description = desc.getAttribute('content') || '';
const keywords = document.querySelector('meta[name="keywords"]');
if (keywords) meta.keywords = keywords.getAttribute('content') || '';
const author = document.querySelector('meta[name="author"]');
if (author) meta.author = author.getAttribute('content') || '';
const title = document.querySelector('title');
if (title) meta.title = title.textContent || '';
data.meta = meta;
}
return data;
}, { wantJsonLd, wantOg, wantTwitter, wantMeta });
return JSON.stringify(result, null, 2);
}
default:
throw new Error(`Unknown read command: ${command}`);
}
+61 -3
View File
@@ -32,6 +32,7 @@ import {
rotateRoot, listTokens, serializeRegistry, restoreRegistry, recordCommand,
isRootToken, checkConnectRateLimit, type TokenInfo,
} from './token-registry';
import { validateTempPath } from './path-security';
import { resolveConfig, ensureStateDir, readVersionHash } from './config';
import { emitActivity, subscribe, getActivityAfter, getActivityHistory, getSubscriberCount } from './activity';
import { inspectElement, modifyStyle, resetModifications, getModificationHistory, detachSession, type InspectorResult } from './cdp-inspector';
@@ -1457,9 +1458,12 @@ async function start() {
}
try {
const pairBody = await req.json() as any;
const scopes = pairBody.admin
? ['read', 'write', 'admin', 'meta'] as const
: (pairBody.scopes || ['read', 'write']) as const;
// Default: full access (read+write+admin+meta). The trust boundary is
// the pairing ceremony itself, not the scope. --control adds browser-wide
// destructive commands (stop, restart, disconnect). --restrict limits scope.
const scopes = pairBody.control || pairBody.admin
? ['read', 'write', 'admin', 'meta', 'control'] as const
: (pairBody.scopes || ['read', 'write', 'admin', 'meta']) as const;
const setupKey = createSetupKey({
clientId: pairBody.clientId,
scopes: [...scopes],
@@ -2031,6 +2035,60 @@ async function start() {
});
}
// ─── File serving endpoint (for remote agents to retrieve downloaded files) ────
if (url.pathname === '/file' && req.method === 'GET') {
const tokenInfo = getTokenInfo(req);
if (!tokenInfo) {
return new Response(JSON.stringify({ error: 'Unauthorized' }), {
status: 401, headers: { 'Content-Type': 'application/json' },
});
}
const filePath = url.searchParams.get('path');
if (!filePath) {
return new Response(JSON.stringify({ error: 'Missing "path" query parameter' }), {
status: 400, headers: { 'Content-Type': 'application/json' },
});
}
try {
validateTempPath(filePath);
} catch (err: any) {
return new Response(JSON.stringify({ error: err.message }), {
status: 403, headers: { 'Content-Type': 'application/json' },
});
}
if (!fs.existsSync(filePath)) {
return new Response(JSON.stringify({ error: 'File not found' }), {
status: 404, headers: { 'Content-Type': 'application/json' },
});
}
const stat = fs.statSync(filePath);
if (stat.size > 200 * 1024 * 1024) {
return new Response(JSON.stringify({ error: 'File too large (max 200MB)' }), {
status: 413, headers: { 'Content-Type': 'application/json' },
});
}
const ext = path.extname(filePath).toLowerCase();
const MIME_MAP: Record<string, string> = {
'.png': 'image/png', '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
'.gif': 'image/gif', '.webp': 'image/webp', '.svg': 'image/svg+xml',
'.avif': 'image/avif',
'.mp4': 'video/mp4', '.webm': 'video/webm', '.mov': 'video/quicktime',
'.mp3': 'audio/mpeg', '.wav': 'audio/wav', '.ogg': 'audio/ogg',
'.pdf': 'application/pdf', '.json': 'application/json',
'.html': 'text/html', '.txt': 'text/plain', '.mhtml': 'message/rfc822',
};
const contentType = MIME_MAP[ext] || 'application/octet-stream';
resetIdleTimer();
return new Response(Bun.file(filePath), {
headers: {
'Content-Type': contentType,
'Content-Length': String(stat.size),
'Content-Disposition': `inline; filename="${path.basename(filePath)}"`,
'Cache-Control': 'no-cache',
},
});
}
// ─── Command endpoint (accepts both root AND scoped tokens) ────
// Must be checked BEFORE the blanket root-only auth gate below,
// because scoped tokens from /connect are valid for /command.
+11 -5
View File
@@ -40,6 +40,7 @@ export const SCOPE_READ = new Set([
'snapshot', 'text', 'html', 'links', 'forms', 'accessibility',
'console', 'network', 'perf', 'dialog', 'is', 'inspect',
'url', 'tabs', 'status', 'screenshot', 'pdf', 'css', 'attrs',
'media', 'data',
]);
/** Commands that modify page state or navigate */
@@ -48,15 +49,19 @@ export const SCOPE_WRITE = new Set([
'click', 'fill', 'select', 'hover', 'type', 'press', 'scroll', 'wait',
'upload', 'viewport', 'newtab', 'closetab',
'dialog-accept', 'dialog-dismiss',
'download', 'scrape', 'archive',
]);
/** Dangerous commands — JS execution, credential access, browser-wide mutations */
/** Page-level power tools — JS execution, credential access, page mutations */
export const SCOPE_ADMIN = new Set([
'eval', 'js', 'cookies', 'storage',
'cookie', 'cookie-import', 'cookie-import-browser',
'header', 'useragent',
'style', 'cleanup', 'prettyscreenshot',
// Browser-wide destructive commands (from Codex adversarial finding):
]);
/** Browser-wide destructive commands — can kill the server, disconnect headed mode */
export const SCOPE_CONTROL = new Set([
'state', 'handoff', 'resume', 'stop', 'restart', 'connect', 'disconnect',
]);
@@ -66,12 +71,13 @@ export const SCOPE_META = new Set([
'watch', 'inbox', 'focus',
]);
export type ScopeCategory = 'read' | 'write' | 'admin' | 'meta';
export type ScopeCategory = 'read' | 'write' | 'admin' | 'meta' | 'control';
const SCOPE_MAP: Record<ScopeCategory, Set<string>> = {
read: SCOPE_READ,
write: SCOPE_WRITE,
admin: SCOPE_ADMIN,
control: SCOPE_CONTROL,
meta: SCOPE_META,
};
@@ -170,7 +176,7 @@ export function createToken(opts: CreateTokenOptions): TokenInfo {
} = opts;
// Validate inputs
const validScopes: ScopeCategory[] = ['read', 'write', 'admin', 'meta'];
const validScopes: ScopeCategory[] = ['read', 'write', 'admin', 'meta', 'control'];
for (const s of scopes) {
if (!validScopes.includes(s as ScopeCategory)) {
throw new Error(`Invalid scope: ${s}. Valid: ${validScopes.join(', ')}`);
@@ -297,7 +303,7 @@ export function validateToken(token: string): TokenInfo | null {
token: rootToken,
clientId: 'root',
type: 'session',
scopes: ['read', 'write', 'admin', 'meta'],
scopes: ['read', 'write', 'admin', 'meta', 'control'],
tabPolicy: 'shared',
rateLimit: 0, // unlimited
expiresAt: null,
+251 -45
View File
@@ -10,54 +10,12 @@ import type { BrowserManager } from './browser-manager';
import { findInstalledBrowsers, importCookies, listSupportedBrowserNames } from './cookie-import-browser';
import { generatePickerCode } from './cookie-picker-routes';
import { validateNavigationUrl } from './url-validation';
import { validateOutputPath } from './path-security';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR, isPathWithin } from './platform';
import { TEMP_DIR } from './platform';
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
// Security: Path validation for screenshot output
// Resolve safe directories through realpathSync to handle symlinks (e.g., macOS /tmp -> /private/tmp)
const SAFE_DIRECTORIES = [TEMP_DIR, process.cwd()].map(d => {
try { return fs.realpathSync(d); } catch { return d; }
});
function validateOutputPath(filePath: string): void {
const resolved = path.resolve(filePath);
// Basic containment check using lexical resolution only.
// This catches obvious traversal (../../../etc/passwd) but NOT symlinks.
const isSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(resolved, dir));
if (!isSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
// Symlink check: resolve the real path of the nearest existing ancestor
// directory and re-validate. This closes the symlink bypass where a
// symlink inside /tmp or cwd points outside the safe zone.
//
// We resolve the parent dir (not the file itself — it may not exist yet).
// If the parent doesn't exist either we fall back up the tree.
let dir = path.dirname(resolved);
let realDir: string;
try {
realDir = fs.realpathSync(dir);
} catch {
// Parent doesn't exist — check the grandparent, or skip if inaccessible
try {
realDir = fs.realpathSync(path.dirname(dir));
} catch {
// Can't resolve — fail safe
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
const realResolved = path.join(realDir, path.basename(resolved));
const isRealSafe = SAFE_DIRECTORIES.some(dir => isPathWithin(realResolved, dir));
if (!isRealSafe) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')} (symlink target blocked)`);
}
}
/**
* Aggressive page cleanup selectors and heuristics.
* Goal: make the page readable and clean while keeping it recognizable.
@@ -314,7 +272,32 @@ export async function handleWriteCommand(
}
case 'scroll': {
const selector = args[0];
// Parse --times N and --wait ms flags
const timesIdx = args.indexOf('--times');
const times = timesIdx >= 0 ? parseInt(args[timesIdx + 1], 10) || 1 : 0;
const waitIdx = args.indexOf('--wait');
const waitMs = waitIdx >= 0 ? parseInt(args[waitIdx + 1], 10) || 1000 : 1000;
const selector = args.find(a => !a.startsWith('--') && args.indexOf(a) !== timesIdx + 1 && args.indexOf(a) !== waitIdx + 1);
if (times > 0) {
// Repeated scroll mode
for (let i = 0; i < times; i++) {
if (selector) {
const resolved = await bm.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
} else {
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
}
} else {
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
}
if (i < times - 1) await new Promise(r => setTimeout(r, waitMs));
}
return `Scrolled ${times} times${selector ? ` (${selector})` : ''} with ${waitMs}ms delay`;
}
// Single scroll (original behavior)
if (selector) {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
@@ -915,7 +898,230 @@ export async function handleWriteCommand(
return parts.join(' ');
}
case 'download': {
if (args.length === 0) throw new Error('Usage: download <url|@ref> [path] [--base64]');
const isBase64 = args.includes('--base64');
const filteredArgs = args.filter(a => a !== '--base64');
let url = filteredArgs[0];
const outputPath = filteredArgs[1];
// Resolve @ref to element src
if (url.startsWith('@')) {
const resolved = await bm.resolveRef(url);
if (!('locator' in resolved)) throw new Error(`Expected @ref, got CSS selector: ${url}`);
const locator = resolved.locator;
const tagName = await locator.evaluate(el => el.tagName.toLowerCase());
if (tagName === 'img') {
url = await locator.evaluate(el => {
const img = el as HTMLImageElement;
return img.currentSrc || img.src || img.getAttribute('data-src') || '';
});
} else if (tagName === 'video') {
url = await locator.evaluate(el => (el as HTMLVideoElement).currentSrc || (el as HTMLVideoElement).src || '');
} else if (tagName === 'audio') {
url = await locator.evaluate(el => (el as HTMLAudioElement).currentSrc || (el as HTMLAudioElement).src || '');
} else {
// Try src attribute on any element
url = await locator.evaluate(el => el.getAttribute('src') || '');
}
if (!url) throw new Error(`Could not extract URL from ${filteredArgs[0]} (${tagName})`);
}
// Check for HLS/DASH
if (url.includes('.m3u8') || url.includes('.mpd')) {
throw new Error('This is an HLS/DASH stream. Use yt-dlp or ffmpeg for adaptive stream downloads.');
}
// Determine output path and extension
const page = bm.getPage();
let contentType = 'application/octet-stream';
let buffer: Buffer;
if (url.startsWith('blob:')) {
// Strategy 3: Blob URL -- in-page fetch + base64
const dataUrl = await page.evaluate(async (blobUrl) => {
try {
const resp = await fetch(blobUrl);
const blob = await resp.blob();
if (blob.size > 100 * 1024 * 1024) return 'ERROR:TOO_LARGE';
return new Promise<string>((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result as string);
reader.onerror = () => reject('Failed to read blob');
reader.readAsDataURL(blob);
});
} catch {
return 'ERROR:EXPIRED';
}
}, url);
if (dataUrl === 'ERROR:TOO_LARGE') throw new Error('Blob too large (>100MB). Use a different approach.');
if (dataUrl === 'ERROR:EXPIRED') throw new Error('Blob URL expired or inaccessible.');
const match = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
if (!match) throw new Error('Failed to decode blob data');
contentType = match[1];
buffer = Buffer.from(match[2], 'base64');
} else {
// Strategy 1: Direct URL via page.request.fetch()
const response = await page.request.fetch(url, { timeout: 30000 });
const status = response.status();
if (status >= 400) {
throw new Error(`Download failed: HTTP ${status} ${response.statusText()}`);
}
contentType = response.headers()['content-type'] || 'application/octet-stream';
buffer = Buffer.from(await response.body());
if (buffer.length > 200 * 1024 * 1024) {
throw new Error('File too large (>200MB).');
}
}
// --base64 mode: return inline
if (isBase64) {
if (buffer.length > 10 * 1024 * 1024) {
throw new Error('File too large for --base64 (>10MB). Use disk download + GET /file instead.');
}
const mimeType = contentType.split(';')[0].trim();
return `data:${mimeType};base64,${buffer.toString('base64')}`;
}
// Write to disk
const ext = contentType.split(';')[0].includes('/')
? mimeToExt(contentType.split(';')[0].trim())
: '.bin';
const destPath = outputPath || path.join(TEMP_DIR, `browse-download-${Date.now()}${ext}`);
validateOutputPath(destPath);
fs.writeFileSync(destPath, buffer);
const sizeKB = Math.round(buffer.length / 1024);
return `Downloaded: ${destPath} (${sizeKB}KB, ${contentType.split(';')[0].trim()})`;
}
case 'scrape': {
if (args.length === 0) throw new Error('Usage: scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]');
const mediaType = args[0];
if (!['images', 'videos', 'media'].includes(mediaType)) {
throw new Error(`Invalid type: ${mediaType}. Use: images, videos, or media`);
}
// Parse flags
const selectorIdx = args.indexOf('--selector');
const selector = selectorIdx >= 0 ? args[selectorIdx + 1] : undefined;
const dirIdx = args.indexOf('--dir');
const dir = dirIdx >= 0 ? args[dirIdx + 1] : path.join(TEMP_DIR, `browse-scrape-${Date.now()}`);
const limitIdx = args.indexOf('--limit');
const limit = Math.min(limitIdx >= 0 ? parseInt(args[limitIdx + 1], 10) || 50 : 50, 200);
validateOutputPath(dir);
fs.mkdirSync(dir, { recursive: true });
const { extractMedia } = await import('./media-extract');
const target = bm.getActiveFrameOrPage();
const filter = mediaType === 'images' ? 'images' as const
: mediaType === 'videos' ? 'videos' as const
: undefined;
const mediaResult = await extractMedia(target, { selector, filter });
// Collect URLs to download
const urls: Array<{ url: string; type: string }> = [];
const seen = new Set<string>();
for (const img of mediaResult.images) {
const url = img.currentSrc || img.src || img.dataSrc;
if (url && !seen.has(url) && !url.startsWith('data:')) {
seen.add(url);
urls.push({ url, type: 'image' });
}
}
for (const vid of mediaResult.videos) {
const url = vid.currentSrc || vid.src;
if (url && !seen.has(url) && !url.startsWith('blob:') && !vid.isHLS && !vid.isDASH) {
seen.add(url);
urls.push({ url, type: 'video' });
}
}
for (const bg of mediaResult.backgroundImages) {
if (bg.url && !seen.has(bg.url)) {
seen.add(bg.url);
urls.push({ url: bg.url, type: 'image' });
}
}
const toDownload = urls.slice(0, limit);
const page = bm.getPage();
const manifest: any = {
url: page.url(),
scraped_at: new Date().toISOString(),
files: [] as any[],
total_size: 0,
succeeded: 0,
failed: 0,
};
const lines: string[] = [];
for (let i = 0; i < toDownload.length; i++) {
const { url, type } = toDownload[i];
try {
const response = await page.request.fetch(url, { timeout: 30000 });
if (response.status() >= 400) throw new Error(`HTTP ${response.status()}`);
const ct = response.headers()['content-type'] || 'application/octet-stream';
const ext = mimeToExt(ct.split(';')[0].trim());
const filename = `${type}-${String(i + 1).padStart(3, '0')}${ext}`;
const filePath = path.join(dir, filename);
const body = Buffer.from(await response.body());
try {
fs.writeFileSync(filePath, body);
} catch (writeErr: any) {
throw new Error(`Disk write failed: ${writeErr.message}`);
}
manifest.files.push({ path: filename, src: url, size: body.length, type: ct.split(';')[0].trim() });
manifest.total_size += body.length;
manifest.succeeded++;
lines.push(` [${i + 1}/${toDownload.length}] ${filename} (${Math.round(body.length / 1024)}KB)`);
} catch (err: any) {
manifest.files.push({ path: null, src: url, size: 0, type: '', error: err.message });
manifest.failed++;
lines.push(` [${i + 1}/${toDownload.length}] FAILED: ${err.message}`);
}
// 100ms delay between downloads
if (i < toDownload.length - 1) await new Promise(r => setTimeout(r, 100));
}
// Write manifest
fs.writeFileSync(path.join(dir, 'manifest.json'), JSON.stringify(manifest, null, 2));
return `Scraped ${toDownload.length} items to ${dir}/\n${lines.join('\n')}\n\nSummary: ${manifest.succeeded} succeeded, ${manifest.failed} failed, ${Math.round(manifest.total_size / 1024)}KB total`;
}
case 'archive': {
const page = bm.getPage();
const outputPath = args[0] || path.join(TEMP_DIR, `browse-archive-${Date.now()}.mhtml`);
validateOutputPath(outputPath);
try {
const cdp = await page.context().newCDPSession(page);
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
await cdp.detach();
fs.writeFileSync(outputPath, data);
return `Archive saved: ${outputPath} (${Math.round(data.length / 1024)}KB, MHTML)`;
} catch (err: any) {
throw new Error(`MHTML archive requires Chromium CDP. Use 'text' or 'html' for raw page content. (${err.message})`);
}
}
default:
throw new Error(`Unknown write command: ${command}`);
}
}
/** Map MIME type to file extension. */
function mimeToExt(mime: string): string {
const map: Record<string, string> = {
'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif',
'image/webp': '.webp', 'image/svg+xml': '.svg', 'image/avif': '.avif',
'video/mp4': '.mp4', 'video/webm': '.webm', 'video/quicktime': '.mov',
'audio/mpeg': '.mp3', 'audio/wav': '.wav', 'audio/ogg': '.ogg',
'application/pdf': '.pdf', 'application/json': '.json',
'text/html': '.html', 'text/plain': '.txt',
};
return map[mime] || '.bin';
}
+176
View File
@@ -0,0 +1,176 @@
/**
* Tests for the browser data platform: media extraction, network capture,
* path security, and structured data extraction.
*/
import { describe, it, expect } from 'bun:test';
import { SizeCappedBuffer, type CapturedResponse } from '../src/network-capture';
import { validateTempPath, validateOutputPath, validateReadPath } from '../src/path-security';
import { TEMP_DIR } from '../src/platform';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
// ─── SizeCappedBuffer ─────────────────────────────────────────
describe('SizeCappedBuffer', () => {
function makeEntry(size: number, url = 'https://example.com'): CapturedResponse {
return {
url,
status: 200,
headers: {},
body: 'x'.repeat(size),
contentType: 'text/plain',
timestamp: Date.now(),
size,
bodyTruncated: false,
};
}
it('stores entries within capacity', () => {
const buf = new SizeCappedBuffer(1000);
buf.push(makeEntry(100));
buf.push(makeEntry(200));
expect(buf.length).toBe(2);
expect(buf.byteSize).toBe(300);
});
it('evicts oldest entries when over capacity', () => {
const buf = new SizeCappedBuffer(500);
buf.push(makeEntry(200, 'https://a.com'));
buf.push(makeEntry(200, 'https://b.com'));
buf.push(makeEntry(200, 'https://c.com')); // should evict first entry
expect(buf.length).toBe(2);
const urls = buf.toArray().map(e => e.url);
expect(urls).toContain('https://b.com');
expect(urls).toContain('https://c.com');
expect(urls).not.toContain('https://a.com');
});
it('evicts multiple entries for one large push', () => {
const buf = new SizeCappedBuffer(500);
buf.push(makeEntry(100));
buf.push(makeEntry(100));
buf.push(makeEntry(100));
buf.push(makeEntry(400)); // evicts first two (need totalSize + 400 <= 500, so totalSize <= 100)
expect(buf.length).toBe(2); // one 100-byte entry + one 400-byte entry
expect(buf.byteSize).toBe(500);
});
it('clear resets buffer', () => {
const buf = new SizeCappedBuffer(1000);
buf.push(makeEntry(100));
buf.push(makeEntry(200));
buf.clear();
expect(buf.length).toBe(0);
expect(buf.byteSize).toBe(0);
});
it('exports to JSONL file', () => {
const buf = new SizeCappedBuffer(1000);
buf.push(makeEntry(10, 'https://a.com'));
buf.push(makeEntry(20, 'https://b.com'));
const tmpFile = path.join(os.tmpdir(), `test-export-${Date.now()}.jsonl`);
try {
const count = buf.exportToFile(tmpFile);
expect(count).toBe(2);
const lines = fs.readFileSync(tmpFile, 'utf-8').trim().split('\n');
expect(lines.length).toBe(2);
const parsed = JSON.parse(lines[0]);
expect(parsed.url).toBe('https://a.com');
} finally {
fs.unlinkSync(tmpFile);
}
});
it('summary shows entries', () => {
const buf = new SizeCappedBuffer(1000);
buf.push(makeEntry(1024, 'https://api.example.com/graphql'));
const summary = buf.summary();
expect(summary).toContain('1 responses');
expect(summary).toContain('graphql');
expect(summary).toContain('1KB');
});
it('summary shows empty message when no entries', () => {
const buf = new SizeCappedBuffer(1000);
expect(buf.summary()).toBe('No captured responses.');
});
});
// ─── validateTempPath ─────────────────────────────────────────
describe('validateTempPath', () => {
let tmpFile: string;
it('allows paths within /tmp that exist', () => {
tmpFile = path.join(TEMP_DIR, `test-temp-path-${Date.now()}.jpg`);
fs.writeFileSync(tmpFile, 'test');
try {
expect(() => validateTempPath(tmpFile)).not.toThrow();
} finally {
fs.unlinkSync(tmpFile);
}
});
it('rejects non-existent files', () => {
expect(() => validateTempPath('/tmp/nonexistent-file-12345.jpg')).toThrow(/not found/i);
});
it('rejects paths in cwd', () => {
// Create a real file in cwd to test the path check (not the existence check)
const cwdFile = path.join(process.cwd(), 'package.json');
expect(() => validateTempPath(cwdFile)).toThrow(/temp directory/i);
});
it('rejects absolute paths outside safe dirs', () => {
expect(() => validateTempPath('/etc/passwd')).toThrow();
});
});
// ─── Command registration ─────────────────────────────────────
describe('command registration', () => {
it('all new commands have descriptions', () => {
// The load-time validation in commands.ts throws if any command
// is missing from COMMAND_DESCRIPTIONS. If this import succeeds,
// all commands are properly registered.
const { COMMAND_DESCRIPTIONS, ALL_COMMANDS } = require('../src/commands');
const newCommands = ['media', 'data', 'download', 'scrape', 'archive'];
for (const cmd of newCommands) {
expect(ALL_COMMANDS.has(cmd)).toBe(true);
expect(COMMAND_DESCRIPTIONS[cmd]).toBeTruthy();
}
});
it('new commands are in correct scope sets', () => {
const { SCOPE_READ, SCOPE_WRITE } = require('../src/token-registry');
expect(SCOPE_READ.has('media')).toBe(true);
expect(SCOPE_READ.has('data')).toBe(true);
expect(SCOPE_WRITE.has('download')).toBe(true);
expect(SCOPE_WRITE.has('scrape')).toBe(true);
expect(SCOPE_WRITE.has('archive')).toBe(true);
});
it('media and data are in PAGE_CONTENT_COMMANDS', () => {
const { PAGE_CONTENT_COMMANDS } = require('../src/commands');
expect(PAGE_CONTENT_COMMANDS.has('media')).toBe(true);
expect(PAGE_CONTENT_COMMANDS.has('data')).toBe(true);
});
});
// ─── MIME type mapping ─────────────────────────────────────────
describe('mimeToExt', () => {
// mimeToExt is a private function in write-commands.ts,
// so we test it indirectly through command behavior.
// This test verifies the source contains the expected mappings.
it('write-commands.ts contains MIME mappings', () => {
const src = fs.readFileSync(path.join(import.meta.dir, '../src/write-commands.ts'), 'utf-8');
expect(src).toContain("'image/png': '.png'");
expect(src).toContain("'image/jpeg': '.jpg'");
expect(src).toContain("'video/mp4': '.mp4'");
expect(src).toContain("'audio/mpeg': '.mp3'");
});
});
+67
View File
@@ -0,0 +1,67 @@
<!DOCTYPE html>
<html>
<head>
<title>Media Test Page</title>
<meta property="og:title" content="Test Product">
<meta property="og:description" content="A test product description">
<meta property="og:image" content="https://example.com/og-image.jpg">
<meta property="og:type" content="product">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Test Product Tweet">
<meta name="description" content="Page description for SEO">
<meta name="keywords" content="test, product, media">
<meta name="author" content="Test Author">
<link rel="canonical" href="https://example.com/test-product">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Test Widget",
"description": "A widget for testing",
"image": "https://example.com/widget.jpg",
"offers": {
"@type": "Offer",
"price": "29.99",
"priceCurrency": "USD"
}
}
</script>
<style>
.hero { background-image: url('https://example.com/hero-bg.jpg'); width: 100%; height: 300px; }
.banner { background-image: url('https://example.com/banner.png'); width: 100%; height: 100px; }
</style>
</head>
<body>
<div class="hero"></div>
<div class="banner"></div>
<!-- Standard images -->
<img src="https://example.com/photo1.jpg" alt="Photo 1" width="800" height="600">
<img src="https://example.com/photo2.png" alt="Photo 2" width="400" height="300">
<!-- Lazy loaded image -->
<img data-src="https://example.com/lazy.jpg" alt="Lazy Image" loading="lazy" width="600" height="400">
<!-- Image with srcset -->
<img src="https://example.com/responsive-sm.jpg"
srcset="https://example.com/responsive-sm.jpg 480w, https://example.com/responsive-lg.jpg 1200w"
alt="Responsive Image"
width="480" height="320">
<!-- Video with sources -->
<video width="640" height="480" poster="https://example.com/poster.jpg">
<source src="https://example.com/video.mp4" type="video/mp4">
<source src="https://example.com/video.webm" type="video/webm">
</video>
<!-- HLS video -->
<video width="1920" height="1080">
<source src="https://example.com/stream.m3u8" type="application/x-mpegURL">
</video>
<!-- Audio -->
<audio>
<source src="https://example.com/podcast.mp3" type="audio/mpeg">
</audio>
</body>
</html>
+13 -13
View File
@@ -17,6 +17,7 @@ const WRITE_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/write-comma
const SERVER_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/server.ts'), 'utf-8');
const AGENT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/sidebar-agent.ts'), 'utf-8');
const SNAPSHOT_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/snapshot.ts'), 'utf-8');
const PATH_SECURITY_SRC = fs.readFileSync(path.join(import.meta.dir, '../src/path-security.ts'), 'utf-8');
// ─── Helper ─────────────────────────────────────────────────────────────────
@@ -159,26 +160,25 @@ describe('Task 2: CSS value validator blocks dangerous patterns', () => {
describe('Task 1: validateOutputPath uses realpathSync', () => {
describe('source-level checks', () => {
it('meta-commands.ts validateOutputPath contains realpathSync', () => {
const fn = extractFunction(META_SRC, 'validateOutputPath');
it('path-security.ts validateOutputPath contains realpathSync', () => {
const fn = extractFunction(PATH_SECURITY_SRC, 'validateOutputPath');
expect(fn).toBeTruthy();
expect(fn).toContain('realpathSync');
});
it('write-commands.ts validateOutputPath contains realpathSync', () => {
const fn = extractFunction(WRITE_SRC, 'validateOutputPath');
expect(fn).toBeTruthy();
expect(fn).toContain('realpathSync');
});
it('meta-commands.ts SAFE_DIRECTORIES resolves with realpathSync', () => {
const safeBlock = sliceBetween(META_SRC, 'const SAFE_DIRECTORIES', ';');
it('path-security.ts SAFE_DIRECTORIES resolves with realpathSync', () => {
const safeBlock = sliceBetween(PATH_SECURITY_SRC, 'const SAFE_DIRECTORIES', ';');
expect(safeBlock).toContain('realpathSync');
});
it('write-commands.ts SAFE_DIRECTORIES resolves with realpathSync', () => {
const safeBlock = sliceBetween(WRITE_SRC, 'const SAFE_DIRECTORIES', ';');
expect(safeBlock).toContain('realpathSync');
it('meta-commands.ts re-exports validateOutputPath from path-security', () => {
expect(META_SRC).toContain("from './path-security'");
expect(META_SRC).toContain('validateOutputPath');
});
it('write-commands.ts imports validateOutputPath from path-security', () => {
expect(WRITE_SRC).toContain("from './path-security'");
expect(WRITE_SRC).toContain('validateOutputPath');
});
});
+4 -4
View File
@@ -113,15 +113,15 @@ describe('generateInstructionBlock', () => {
expect(block).not.toContain('re-pair with --admin');
});
it('shows re-pair hint when admin not included', () => {
it('shows re-pair hint when control not included', () => {
const block = generateInstructionBlock({
setupKey: 'gsk_setup_nonadmin',
setupKey: 'gsk_setup_nocontrol',
serverUrl: 'https://test.ngrok.dev',
scopes: ['read', 'write'],
scopes: ['read', 'write', 'admin', 'meta'],
expiresAt: '2026-04-06T00:00:00Z',
});
expect(block).toContain('re-pair with --admin');
expect(block).toContain('re-pair with --control');
});
it('includes newtab as step 2 (agents must own their tab)', () => {
+9 -5
View File
@@ -5,7 +5,7 @@ import {
validateToken, checkScope, checkDomain, checkRate,
revokeToken, rotateRoot, listTokens, recordCommand,
serializeRegistry, restoreRegistry, checkConnectRateLimit,
SCOPE_READ, SCOPE_WRITE, SCOPE_ADMIN, SCOPE_META,
SCOPE_READ, SCOPE_WRITE, SCOPE_ADMIN, SCOPE_CONTROL, SCOPE_META,
} from '../src/token-registry';
describe('token-registry', () => {
@@ -25,7 +25,7 @@ describe('token-registry', () => {
const info = validateToken('root-token-for-tests');
expect(info).not.toBeNull();
expect(info!.clientId).toBe('root');
expect(info!.scopes).toEqual(['read', 'write', 'admin', 'meta']);
expect(info!.scopes).toEqual(['read', 'write', 'admin', 'meta', 'control']);
expect(info!.rateLimit).toBe(0);
});
});
@@ -324,7 +324,7 @@ describe('token-registry', () => {
it('every command in commands.ts is covered by a scope', () => {
// Import the command sets to verify coverage
const allInScopes = new Set([
...SCOPE_READ, ...SCOPE_WRITE, ...SCOPE_ADMIN, ...SCOPE_META,
...SCOPE_READ, ...SCOPE_WRITE, ...SCOPE_ADMIN, ...SCOPE_CONTROL, ...SCOPE_META,
]);
// chain is a special case (checked via meta scope but dispatches subcommands)
allInScopes.add('chain');
@@ -339,8 +339,12 @@ describe('token-registry', () => {
expect(SCOPE_ADMIN.has('cookies')).toBe(true);
expect(SCOPE_ADMIN.has('storage')).toBe(true);
expect(SCOPE_ADMIN.has('useragent')).toBe(true);
expect(SCOPE_ADMIN.has('state')).toBe(true);
expect(SCOPE_ADMIN.has('handoff')).toBe(true);
// Browser-wide destructive commands moved to SCOPE_CONTROL
expect(SCOPE_CONTROL.has('state')).toBe(true);
expect(SCOPE_CONTROL.has('handoff')).toBe(true);
expect(SCOPE_CONTROL.has('stop')).toBe(true);
expect(SCOPE_CONTROL.has('restart')).toBe(true);
expect(SCOPE_CONTROL.has('disconnect')).toBe(true);
// Verify safe read commands are NOT in admin
expect(SCOPE_ADMIN.has('text')).toBe(false);
+1 -1
View File
@@ -13,7 +13,7 @@ export function generateCommandReference(_ctx: TemplateContext): string {
// Category display order
const categoryOrder = [
'Navigation', 'Reading', 'Interaction', 'Inspection',
'Navigation', 'Reading', 'Extraction', 'Interaction', 'Inspection',
'Visual', 'Snapshot', 'Meta', 'Tabs', 'Server',
];