Files
gstack/browse/src/write-commands.ts
T
Garry Tan c6e6a21d1a refactor: AI slop reduction with cross-model quality review (v0.16.3.0) (#941)
* refactor: add error-handling utility module with selective catches

safeUnlink (ignores ENOENT), safeKill (ignores ESRCH), isProcessAlive
(extracted from cli.ts with Windows support), and json() Response helper.
All catches check err.code and rethrow unexpected errors instead of
swallowing silently. Unit tests cover happy path + error code paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace defensive try/catches in server.ts with utilities

Replace ~12 try/catch sites with safeUnlink/safeKill calls in shutdown,
emergencyCleanup, killAgent, and log cleanup. Convert empty catches to
selective catches with error code checks. Remove needless welcome page
try/catches (fs.existsSync doesn't need wrapping). Reduces slop-scan
empty-catch locations from 11 to 8 and error-swallowing from 24 to 18.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: extract isProcessAlive and replace try/catches in cli.ts

Move isProcessAlive to shared error-handling module. Replace ~20
try/catch sites with safeUnlink/safeKill in killServer, connect,
disconnect, and cleanup flows. Convert empty catches to selective
catches. Reduces slop-scan empty-catch from 22 to 2 locations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove unnecessary return await in content-security and read-commands

Remove 6 redundant return-await patterns where there's no enclosing
try block. Eliminates all defensive.async-noise findings from these files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add slop-scan config to exclude vendor files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace empty catches with selective error handling in sidebar-agent

Convert 8 empty catch blocks to selective catches that check err.code
(ESRCH for process kills, ENOENT for file ops). Import safeUnlink for
cancel file cleanup. Unexpected errors now propagate instead of being
silently swallowed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace empty catches and mark pass-through wrappers in browser-manager

Convert 12 empty catch blocks to selective catches: filesystem ops check
ENOENT/EACCES, browser ops check for closed/Target messages, URL parsing
checks TypeError. Add 'alias for active session' comments above 6
pass-through wrapper methods to document their purpose (and exempt from
slop-scan pass-through-wrappers rule).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: selective catches in gstack-global-discover

Convert 8 defensive catch blocks to selective error handling. Filesystem
ops check ENOENT/EACCES, process ops check exit status. Unexpected errors
now propagate instead of returning silent defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: selective catches in write-commands, cdp-inspector, meta-commands, snapshot

Convert ~27 empty/obscuring catches to selective error handling across 4
browse source files. CDP ops check for closed/Target/detached messages,
DOM ops check TypeError/DOMException, filesystem ops check ENOENT/EACCES,
JSON parsing checks SyntaxError. Remove dead code in cdp-inspector where
try/catch wrapped synchronous no-ops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: selective catches in Chrome extension files

Convert empty catches and error-swallowing patterns across inspector.js,
content.js, background.js, and sidepanel.js. DOM catches filter
TypeError/DOMException, chrome API catches filter Extension context
invalidated, network catches filter Failed to fetch. Unexpected errors
now propagate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore isProcessAlive boolean semantics, add safeUnlinkQuiet, remove unused json()

isProcessAlive now catches ALL errors and returns false (pure boolean
probe). Callers use it in if/while conditions without try/catch, so
throwing on EPERM was a behavior change that could crash the CLI.
Windows path gets its safety catch restored.

safeUnlinkQuiet added for best-effort cleanup paths where throwing on
non-ENOENT errors (like EPERM during shutdown) would abort cleanup.

json() removed — dead code, never imported anywhere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use safeUnlinkQuiet in shutdown and cleanup paths

Shutdown, emergency cleanup, and disconnect paths should never throw
on file deletion failures. Switched from safeUnlink (throws on EPERM)
to safeUnlinkQuiet (swallows all errors) in these best-effort paths.
Normal operation paths (startup, lock release) keep safeUnlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: remove brittle string-matching catches and alias comments in browser-manager

Revert 6 catches that matched error messages via includes('closed'),
includes('Target'), etc. back to empty catches. These fire-and-forget
operations (page.close, bringToFront, dialog dismiss) genuinely don't
care about any error type. String matching on error messages is brittle
and will break on Playwright version bumps.

Remove 6 'alias for active session' comments that existed solely to
game slop-scan's pass-through-wrapper exemption rule.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: remove brittle string-matching catches in extension files

Revert error-swallowing fixes in background.js and sidepanel.js that
matched error messages via includes('Failed to fetch'), includes(
'Extension context invalidated'), etc. In Chrome extensions, uncaught
errors crash the entire extension. The original catch-and-log pattern
is the correct choice for extension code where any error is non-fatal.

content.js and inspector.js changes kept — their TypeError/DOMException
catches are typed, not string-based.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add slop-scan usage guidelines to CLAUDE.md

Instructions for using slop-scan to improve genuine code quality, not
to game metrics or hide that we're AI-coded. Documents what to fix
(empty catches on file/process ops, typed exception narrows, return
await) and what NOT to fix (string-matching on error messages, linter
gaming comments, tightening extension/cleanup catches). Includes
utility function reference and baseline score tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add slop-scan as diagnostic in test suite

Runs slop-scan after bun test as a non-blocking diagnostic. Prints
the summary (top files, hotspots) so you see the number without it
gating anything. Available standalone via bun run slop.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: slop-diff shows only NEW findings introduced on this branch

Runs slop-scan on HEAD and the merge-base, diffs results with
line-number-insensitive fingerprinting so shifted code doesn't create
false positives. Uses git worktree for clean base comparison. Shows
net new vs removed findings. Runs automatically after bun test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: design doc for slop-scan integration in /review and /ship

Deferred plan for surfacing slop-diff findings automatically during
code review and shipping. Documents integration points, auto-fix vs
skip heuristics, and implementation notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.16.3.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:13:15 -10:00

1136 lines
48 KiB
TypeScript

/**
* Write commands — navigate and interact with pages (side effects)
*
* goto, back, forward, reload, click, fill, select, hover, type,
* press, scroll, wait, viewport, cookie, header, useragent
*/
import type { TabSession } from './tab-session';
import type { BrowserManager } from './browser-manager';
import { findInstalledBrowsers, importCookies, listSupportedBrowserNames } from './cookie-import-browser';
import { generatePickerCode } from './cookie-picker-routes';
import { validateNavigationUrl } from './url-validation';
import { validateOutputPath } from './path-security';
import * as fs from 'fs';
import * as path from 'path';
import { TEMP_DIR } from './platform';
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
/**
* Aggressive page cleanup selectors and heuristics.
* Goal: make the page readable and clean while keeping it recognizable.
* Inspired by uBlock Origin filter lists, Readability.js, and reader mode heuristics.
*/
const CLEANUP_SELECTORS = {
ads: [
// Google Ads
'ins.adsbygoogle', '[id^="google_ads"]', '[id^="div-gpt-ad"]',
'iframe[src*="doubleclick"]', 'iframe[src*="googlesyndication"]',
'[data-google-query-id]', '.google-auto-placed',
// Generic ad patterns (uBlock Origin common filters)
'[class*="ad-banner"]', '[class*="ad-wrapper"]', '[class*="ad-container"]',
'[class*="ad-slot"]', '[class*="ad-unit"]', '[class*="ad-zone"]',
'[class*="ad-placement"]', '[class*="ad-holder"]', '[class*="ad-block"]',
'[class*="adbox"]', '[class*="adunit"]', '[class*="adwrap"]',
'[id*="ad-banner"]', '[id*="ad-wrapper"]', '[id*="ad-container"]',
'[id*="ad-slot"]', '[id*="ad_banner"]', '[id*="ad_container"]',
'[data-ad]', '[data-ad-slot]', '[data-ad-unit]', '[data-adunit]',
'[class*="sponsored"]', '[class*="Sponsored"]',
'.ad', '.ads', '.advert', '.advertisement',
'#ad', '#ads', '#advert', '#advertisement',
// Common ad network iframes
'iframe[src*="amazon-adsystem"]', 'iframe[src*="outbrain"]',
'iframe[src*="taboola"]', 'iframe[src*="criteo"]',
'iframe[src*="adsafeprotected"]', 'iframe[src*="moatads"]',
// Promoted/sponsored content
'[class*="promoted"]', '[class*="Promoted"]',
'[data-testid*="promo"]', '[class*="native-ad"]',
// Empty ad placeholders (divs with only ad classes, no real content)
'aside[class*="ad"]', 'section[class*="ad-"]',
],
cookies: [
// Cookie consent frameworks
'[class*="cookie-consent"]', '[class*="cookie-banner"]', '[class*="cookie-notice"]',
'[id*="cookie-consent"]', '[id*="cookie-banner"]', '[id*="cookie-notice"]',
'[class*="consent-banner"]', '[class*="consent-modal"]', '[class*="consent-wall"]',
'[class*="gdpr"]', '[id*="gdpr"]', '[class*="GDPR"]',
'[class*="CookieConsent"]', '[id*="CookieConsent"]',
// OneTrust (very common)
'#onetrust-consent-sdk', '.onetrust-pc-dark-filter', '#onetrust-banner-sdk',
// Cookiebot
'#CybotCookiebotDialog', '#CybotCookiebotDialogBodyUnderlay',
// TrustArc / TRUSTe
'#truste-consent-track', '.truste_overlay', '.truste_box_overlay',
// Quantcast
'.qc-cmp2-container', '#qc-cmp2-main',
// Generic patterns
'[class*="cc-banner"]', '[class*="cc-window"]', '[class*="cc-overlay"]',
'[class*="privacy-banner"]', '[class*="privacy-notice"]',
'[id*="privacy-banner"]', '[id*="privacy-notice"]',
'[class*="accept-cookies"]', '[id*="accept-cookies"]',
],
overlays: [
// Paywall / subscription overlays
'[class*="paywall"]', '[class*="Paywall"]', '[id*="paywall"]',
'[class*="subscribe-wall"]', '[class*="subscription-wall"]',
'[class*="meter-wall"]', '[class*="regwall"]', '[class*="reg-wall"]',
// Newsletter / signup popups
'[class*="newsletter-popup"]', '[class*="newsletter-modal"]',
'[class*="signup-modal"]', '[class*="signup-popup"]',
'[class*="email-capture"]', '[class*="lead-capture"]',
'[class*="popup-modal"]', '[class*="modal-overlay"]',
// Interstitials
'[class*="interstitial"]', '[id*="interstitial"]',
// Push notification prompts
'[class*="push-notification"]', '[class*="notification-prompt"]',
'[class*="web-push"]',
// Survey / feedback popups
'[class*="survey-"]', '[class*="feedback-modal"]',
'[id*="survey-"]', '[class*="nps-"]',
// App download banners
'[class*="app-banner"]', '[class*="smart-banner"]', '[class*="app-download"]',
'[id*="branch-banner"]', '.smartbanner',
// Cross-promotion / "follow us" / "preferred source" widgets
'[class*="promo-banner"]', '[class*="cross-promo"]', '[class*="partner-promo"]',
'[class*="preferred-source"]', '[class*="google-promo"]',
],
clutter: [
// Audio/podcast player widgets (not part of the article text)
'[class*="audio-player"]', '[class*="podcast-player"]', '[class*="listen-widget"]',
'[class*="everlit"]', '[class*="Everlit"]',
'audio', // bare audio elements
// Sidebar games/puzzles widgets
'[class*="puzzle"]', '[class*="daily-game"]', '[class*="games-widget"]',
'[class*="crossword-promo"]', '[class*="mini-game"]',
// "Most Popular" / "Trending" sidebar recirculation (not the top nav trending bar)
'aside [class*="most-popular"]', 'aside [class*="trending"]',
'aside [class*="most-read"]', 'aside [class*="recommended"]',
// Related articles / recirculation at bottom
'[class*="related-articles"]', '[class*="more-stories"]',
'[class*="recirculation"]', '[class*="taboola"]', '[class*="outbrain"]',
// Hearst-specific (SF Chronicle, etc.)
'[class*="nativo"]', '[data-tb-region]',
],
sticky: [
// Handled via JavaScript evaluation, not pure selectors
],
social: [
'[class*="social-share"]', '[class*="share-buttons"]', '[class*="share-bar"]',
'[class*="social-widget"]', '[class*="social-icons"]', '[class*="share-tools"]',
'iframe[src*="facebook.com/plugins"]', 'iframe[src*="platform.twitter"]',
'[class*="fb-like"]', '[class*="tweet-button"]',
'[class*="addthis"]', '[class*="sharethis"]',
// Follow prompts
'[class*="follow-us"]', '[class*="social-follow"]',
],
};
export async function handleWriteCommand(
command: string,
args: string[],
session: TabSession,
bm: BrowserManager
): Promise<string> {
const page = session.getPage();
// Frame-aware target for locator-based operations (click, fill, etc.)
const target = session.getActiveFrameOrPage();
const inFrame = session.getFrame() !== null;
switch (command) {
case 'goto': {
if (inFrame) throw new Error('Cannot use goto inside a frame. Run \'frame main\' first.');
const url = args[0];
if (!url) throw new Error('Usage: browse goto <url>');
await validateNavigationUrl(url);
const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
const status = response?.status() || 'unknown';
return `Navigated to ${url} (${status})`;
}
case 'back': {
if (inFrame) throw new Error('Cannot use back inside a frame. Run \'frame main\' first.');
await page.goBack({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Back → ${page.url()}`;
}
case 'forward': {
if (inFrame) throw new Error('Cannot use forward inside a frame. Run \'frame main\' first.');
await page.goForward({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Forward → ${page.url()}`;
}
case 'reload': {
if (inFrame) throw new Error('Cannot use reload inside a frame. Run \'frame main\' first.');
await page.reload({ waitUntil: 'domcontentloaded', timeout: 15000 });
return `Reloaded ${page.url()}`;
}
case 'click': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse click <selector>');
// Auto-route: if ref points to a real <option> inside a <select>, use selectOption
const role = session.getRefRole(selector);
if (role === 'option') {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
const optionInfo = await resolved.locator.evaluate(el => {
if (el.tagName !== 'OPTION') return null; // custom [role=option], not real <option>
const option = el as HTMLOptionElement;
const select = option.closest('select');
if (!select) return null;
return { value: option.value, text: option.text };
});
if (optionInfo) {
await resolved.locator.locator('xpath=ancestor::select').selectOption(optionInfo.value, { timeout: 5000 });
return `Selected "${optionInfo.text}" (auto-routed from click on <option>) → now at ${page.url()}`;
}
// Real <option> with no parent <select> or custom [role=option] — fall through to normal click
}
}
const resolved = await session.resolveRef(selector);
try {
if ('locator' in resolved) {
await resolved.locator.click({ timeout: 5000 });
} else {
await target.locator(resolved.selector).click({ timeout: 5000 });
}
} catch (err: any) {
// Enhanced error guidance: clicking <option> elements always fails (not visible / timeout)
const isOption = 'locator' in resolved
? await resolved.locator.evaluate(el => el.tagName === 'OPTION').catch(() => false)
: await target.locator(resolved.selector).evaluate(
el => el.tagName === 'OPTION'
).catch(() => false);
if (isOption) {
throw new Error(
`Cannot click <option> elements. Use 'browse select <parent-select> <value>' instead of 'click' for dropdown options.`
);
}
throw err;
}
// Wait for network to settle (catches XHR/fetch triggered by clicks)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Clicked ${selector} → now at ${page.url()}`;
}
case 'fill': {
const [selector, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !value) throw new Error('Usage: browse fill <selector> <value>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.fill(value, { timeout: 5000 });
} else {
await target.locator(resolved.selector).fill(value, { timeout: 5000 });
}
// Wait for network to settle (form validation XHRs)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Filled ${selector}`;
}
case 'select': {
const [selector, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !value) throw new Error('Usage: browse select <selector> <value>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.selectOption(value, { timeout: 5000 });
} else {
await target.locator(resolved.selector).selectOption(value, { timeout: 5000 });
}
// Wait for network to settle (dropdown-triggered requests)
await page.waitForLoadState('networkidle', { timeout: 2000 }).catch(() => {});
return `Selected "${value}" in ${selector}`;
}
case 'hover': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse hover <selector>');
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.hover({ timeout: 5000 });
} else {
await target.locator(resolved.selector).hover({ timeout: 5000 });
}
return `Hovered ${selector}`;
}
case 'type': {
const text = args.join(' ');
if (!text) throw new Error('Usage: browse type <text>');
await page.keyboard.type(text);
return `Typed ${text.length} characters`;
}
case 'press': {
const key = args[0];
if (!key) throw new Error('Usage: browse press <key> (e.g., Enter, Tab, Escape)');
await page.keyboard.press(key);
return `Pressed ${key}`;
}
case 'scroll': {
// Parse --times N and --wait ms flags
const timesIdx = args.indexOf('--times');
const times = timesIdx >= 0 ? parseInt(args[timesIdx + 1], 10) || 1 : 0;
const waitIdx = args.indexOf('--wait');
const waitMs = waitIdx >= 0 ? parseInt(args[waitIdx + 1], 10) || 1000 : 1000;
const selector = args.find(a => !a.startsWith('--') && args.indexOf(a) !== timesIdx + 1 && args.indexOf(a) !== waitIdx + 1);
if (times > 0) {
// Repeated scroll mode
for (let i = 0; i < times; i++) {
if (selector) {
const resolved = await bm.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
} else {
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
}
} else {
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
}
if (i < times - 1) await new Promise(r => setTimeout(r, waitMs));
}
return `Scrolled ${times} times${selector ? ` (${selector})` : ''} with ${waitMs}ms delay`;
}
// Single scroll (original behavior)
if (selector) {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
} else {
await target.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
}
return `Scrolled ${selector} into view`;
}
await target.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
return 'Scrolled to bottom';
}
case 'wait': {
const selector = args[0];
if (!selector) throw new Error('Usage: browse wait <selector|--networkidle|--load|--domcontentloaded>');
if (selector === '--networkidle') {
const MAX_WAIT_MS = 300_000;
const MIN_WAIT_MS = 1_000;
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
await page.waitForLoadState('networkidle', { timeout });
return 'Network idle';
}
if (selector === '--load') {
await page.waitForLoadState('load');
return 'Page loaded';
}
if (selector === '--domcontentloaded') {
await page.waitForLoadState('domcontentloaded');
return 'DOM content loaded';
}
const MAX_WAIT_MS = 300_000;
const MIN_WAIT_MS = 1_000;
const timeout = Math.min(Math.max(args[1] ? parseInt(args[1], 10) || MIN_WAIT_MS : 15000, MIN_WAIT_MS), MAX_WAIT_MS);
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.waitFor({ state: 'visible', timeout });
} else {
await target.locator(resolved.selector).waitFor({ state: 'visible', timeout });
}
return `Element ${selector} appeared`;
}
case 'viewport': {
const size = args[0];
if (!size || !size.includes('x')) throw new Error('Usage: browse viewport <WxH> (e.g., 375x812)');
const [rawW, rawH] = size.split('x').map(Number);
const w = Math.min(Math.max(Math.round(rawW) || 1280, 1), 16384);
const h = Math.min(Math.max(Math.round(rawH) || 720, 1), 16384);
await bm.setViewport(w, h);
return `Viewport set to ${w}x${h}`;
}
case 'cookie': {
const cookieStr = args[0];
if (!cookieStr || !cookieStr.includes('=')) throw new Error('Usage: browse cookie <name>=<value>');
const eq = cookieStr.indexOf('=');
const name = cookieStr.slice(0, eq);
const value = cookieStr.slice(eq + 1);
const url = new URL(page.url());
await page.context().addCookies([{
name,
value,
domain: url.hostname,
path: '/',
}]);
return `Cookie set: ${name}=****`;
}
case 'header': {
const headerStr = args[0];
if (!headerStr || !headerStr.includes(':')) throw new Error('Usage: browse header <name>:<value>');
const sep = headerStr.indexOf(':');
const name = headerStr.slice(0, sep).trim();
const value = headerStr.slice(sep + 1).trim();
await bm.setExtraHeader(name, value);
const sensitiveHeaders = ['authorization', 'cookie', 'set-cookie', 'x-api-key', 'x-auth-token'];
const redactedValue = sensitiveHeaders.includes(name.toLowerCase()) ? '****' : value;
return `Header set: ${name}: ${redactedValue}`;
}
case 'useragent': {
const ua = args.join(' ');
if (!ua) throw new Error('Usage: browse useragent <string>');
bm.setUserAgent(ua);
const error = await bm.recreateContext();
if (error) {
return `User agent set to "${ua}" but: ${error}`;
}
return `User agent set: ${ua}`;
}
case 'upload': {
const [selector, ...filePaths] = args;
if (!selector || filePaths.length === 0) throw new Error('Usage: browse upload <selector> <file1> [file2...]');
// Validate paths are within safe directories (same check as cookie-import)
for (const fp of filePaths) {
if (!fs.existsSync(fp)) throw new Error(`File not found: ${fp}`);
if (path.isAbsolute(fp)) {
let resolvedFp: string;
try { resolvedFp = fs.realpathSync(path.resolve(fp)); } catch (err: any) { if (err?.code !== 'ENOENT') throw err; resolvedFp = path.resolve(fp); }
if (!SAFE_DIRECTORIES.some(dir => isPathWithin(resolvedFp, dir))) {
throw new Error(`Path must be within: ${SAFE_DIRECTORIES.join(', ')}`);
}
}
if (path.normalize(fp).includes('..')) {
throw new Error('Path traversal sequences (..) are not allowed');
}
}
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
await resolved.locator.setInputFiles(filePaths);
} else {
await target.locator(resolved.selector).setInputFiles(filePaths);
}
const fileInfo = filePaths.map(fp => {
const stat = fs.statSync(fp);
return `${path.basename(fp)} (${stat.size}B)`;
}).join(', ');
return `Uploaded: ${fileInfo}`;
}
case 'dialog-accept': {
const text = args.length > 0 ? args.join(' ') : null;
bm.setDialogAutoAccept(true);
bm.setDialogPromptText(text);
return text
? `Dialogs will be accepted with text: "${text}"`
: 'Dialogs will be accepted';
}
case 'dialog-dismiss': {
bm.setDialogAutoAccept(false);
bm.setDialogPromptText(null);
return 'Dialogs will be dismissed';
}
case 'cookie-import': {
const filePath = args[0];
if (!filePath) throw new Error('Usage: browse cookie-import <json-file>');
// Path validation — prevent reading arbitrary files
if (path.isAbsolute(filePath)) {
const safeDirs = [TEMP_DIR, process.cwd()];
const resolved = path.resolve(filePath);
if (!safeDirs.some(dir => isPathWithin(resolved, dir))) {
throw new Error(`Path must be within: ${safeDirs.join(', ')}`);
}
}
if (path.normalize(filePath).includes('..')) {
throw new Error('Path traversal sequences (..) are not allowed');
}
if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
const raw = fs.readFileSync(filePath, 'utf-8');
let cookies: any[];
try { cookies = JSON.parse(raw); } catch (err: any) { throw new Error(`Invalid JSON in ${filePath}: ${err?.message || err}`); }
if (!Array.isArray(cookies)) throw new Error('Cookie file must contain a JSON array');
// Auto-fill domain from current page URL when missing (consistent with cookie command)
const pageUrl = new URL(page.url());
const defaultDomain = pageUrl.hostname;
for (const c of cookies) {
if (!c.name || c.value === undefined) throw new Error('Each cookie must have "name" and "value" fields');
if (!c.domain) {
c.domain = defaultDomain;
} else {
const cookieDomain = c.domain.startsWith('.') ? c.domain.slice(1) : c.domain;
if (cookieDomain !== defaultDomain && !defaultDomain.endsWith('.' + cookieDomain)) {
throw new Error(`Cookie domain "${c.domain}" does not match current page domain "${defaultDomain}". Use the target site first.`);
}
}
if (!c.path) c.path = '/';
}
await page.context().addCookies(cookies);
return `Loaded ${cookies.length} cookies from ${filePath}`;
}
case 'cookie-import-browser': {
// Two modes:
// 1. Direct CLI import: cookie-import-browser <browser> --domain <domain> [--profile <profile>]
// 2. Open picker UI: cookie-import-browser [browser]
const browserArg = args[0];
const domainIdx = args.indexOf('--domain');
const profileIdx = args.indexOf('--profile');
const profile = (profileIdx !== -1 && profileIdx + 1 < args.length) ? args[profileIdx + 1] : 'Default';
if (domainIdx !== -1 && domainIdx + 1 < args.length) {
// Direct import mode — no UI
const domain = args[domainIdx + 1];
// Validate --domain against current page hostname to prevent cross-site cookie injection
const pageHostname = new URL(page.url()).hostname;
const normalizedDomain = domain.startsWith('.') ? domain.slice(1) : domain;
if (normalizedDomain !== pageHostname && !pageHostname.endsWith('.' + normalizedDomain)) {
throw new Error(`--domain "${domain}" does not match current page domain "${pageHostname}". Navigate to the target site first.`);
}
const browser = browserArg || 'comet';
const result = await importCookies(browser, [domain], profile);
if (result.cookies.length > 0) {
await page.context().addCookies(result.cookies);
}
const msg = [`Imported ${result.count} cookies for ${domain} from ${browser}`];
if (result.failed > 0) msg.push(`(${result.failed} failed to decrypt)`);
return msg.join(' ');
}
// Picker UI mode — open in user's browser
const port = bm.serverPort;
if (!port) throw new Error('Server port not available');
const browsers = findInstalledBrowsers();
if (browsers.length === 0) {
throw new Error(`No Chromium browsers found. Supported: ${listSupportedBrowserNames().join(', ')}`);
}
const code = generatePickerCode();
const pickerUrl = `http://127.0.0.1:${port}/cookie-picker?code=${code}`;
try {
Bun.spawn(['open', pickerUrl], { stdout: 'ignore', stderr: 'ignore' });
} catch (err: any) {
// open may fail on non-macOS or if 'open' binary is missing — URL is in the message below
if (err?.code !== 'ENOENT' && !err?.message?.includes('spawn')) throw err;
}
return `Cookie picker opened at http://127.0.0.1:${port}/cookie-picker\nDetected browsers: ${browsers.map(b => b.name).join(', ')}\nSelect domains to import, then close the picker when done.`;
}
case 'style': {
// style --undo [N] → revert modification
if (args[0] === '--undo') {
const idx = args[1] ? parseInt(args[1], 10) : undefined;
await undoModification(page, idx);
return idx !== undefined ? `Reverted modification #${idx}` : 'Reverted last modification';
}
// style <selector> <property> <value>
const [selector, property, ...valueParts] = args;
const value = valueParts.join(' ');
if (!selector || !property || !value) {
throw new Error('Usage: browse style <sel> <prop> <value> | style --undo [N]');
}
// Validate CSS property name
if (!/^[a-zA-Z-]+$/.test(property)) {
throw new Error(`Invalid CSS property name: ${property}. Only letters and hyphens allowed.`);
}
// Validate CSS value — block data exfiltration patterns
const DANGEROUS_CSS = /url\s*\(|expression\s*\(|@import|javascript:|data:/i;
if (DANGEROUS_CSS.test(value)) {
throw new Error('CSS value rejected: contains potentially dangerous pattern.');
}
const mod = await modifyStyle(page, selector, property, value);
return `Style modified: ${selector} { ${property}: ${mod.oldValue || '(none)'}${value} } (${mod.method})`;
}
case 'cleanup': {
// Parse flags
let doAds = false, doCookies = false, doSticky = false, doSocial = false;
let doOverlays = false, doClutter = false;
let doAll = false;
// Default to --all if no args (most common use case from sidebar button)
if (args.length === 0) {
doAll = true;
}
for (const arg of args) {
switch (arg) {
case '--ads': doAds = true; break;
case '--cookies': doCookies = true; break;
case '--sticky': doSticky = true; break;
case '--social': doSocial = true; break;
case '--overlays': doOverlays = true; break;
case '--clutter': doClutter = true; break;
case '--all': doAll = true; break;
default:
throw new Error(`Unknown cleanup flag: ${arg}. Use: --ads, --cookies, --sticky, --social, --overlays, --clutter, --all`);
}
}
if (doAll) {
doAds = doCookies = doSticky = doSocial = doOverlays = doClutter = true;
}
const removed: string[] = [];
// Build selector list for categories to clean
const selectors: string[] = [];
if (doAds) selectors.push(...CLEANUP_SELECTORS.ads);
if (doCookies) selectors.push(...CLEANUP_SELECTORS.cookies);
if (doSocial) selectors.push(...CLEANUP_SELECTORS.social);
if (doOverlays) selectors.push(...CLEANUP_SELECTORS.overlays);
if (doClutter) selectors.push(...CLEANUP_SELECTORS.clutter);
if (selectors.length > 0) {
const count = await page.evaluate((sels: string[]) => {
let removed = 0;
for (const sel of sels) {
try {
const els = document.querySelectorAll(sel);
els.forEach(el => {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
removed++;
});
} catch (err: any) {
// querySelectorAll throws DOMException on invalid CSS selectors — skip those
if (!(err instanceof DOMException)) throw err;
}
}
return removed;
}, selectors);
if (count > 0) {
if (doAds) removed.push('ads');
if (doCookies) removed.push('cookie banners');
if (doSocial) removed.push('social widgets');
if (doOverlays) removed.push('overlays/popups');
if (doClutter) removed.push('clutter');
}
}
// Sticky/fixed elements — handled separately with computed style check
if (doSticky) {
const stickyCount = await page.evaluate(() => {
let removed = 0;
// Collect all sticky/fixed elements, sort by vertical position
const stickyEls: Array<{ el: Element; top: number; width: number; height: number }> = [];
const allElements = document.querySelectorAll('*');
const viewportWidth = window.innerWidth;
for (const el of allElements) {
const style = getComputedStyle(el);
if (style.position === 'fixed' || style.position === 'sticky') {
const rect = el.getBoundingClientRect();
stickyEls.push({ el, top: rect.top, width: rect.width, height: rect.height });
}
}
// Sort by vertical position (topmost first)
stickyEls.sort((a, b) => a.top - b.top);
let preservedTopNav = false;
for (const { el, top, width, height } of stickyEls) {
const tag = el.tagName.toLowerCase();
// Always skip nav/header semantic elements
if (tag === 'nav' || tag === 'header') continue;
if (el.getAttribute('role') === 'navigation') continue;
// Skip the gstack control indicator
if ((el as HTMLElement).id === 'gstack-ctrl') continue;
// Preserve the FIRST full-width element near the top (site's main nav bar)
// This catches divs that act as navbars but aren't semantic <nav> elements
if (!preservedTopNav && top <= 50 && width > viewportWidth * 0.8 && height < 120) {
preservedTopNav = true;
continue;
}
(el as HTMLElement).style.setProperty('display', 'none', 'important');
removed++;
}
return removed;
});
if (stickyCount > 0) removed.push(`${stickyCount} sticky/fixed elements`);
}
// Unlock scrolling (many sites lock body scroll when modals are open)
const scrollFixed = await page.evaluate(() => {
let fixed = 0;
// Unlock body and html scroll
for (const el of [document.body, document.documentElement]) {
if (!el) continue;
const style = getComputedStyle(el);
if (style.overflow === 'hidden' || style.overflowY === 'hidden') {
(el as HTMLElement).style.setProperty('overflow', 'auto', 'important');
(el as HTMLElement).style.setProperty('overflow-y', 'auto', 'important');
fixed++;
}
// Remove height:100% + position:fixed that locks scroll
if (style.position === 'fixed' && (el === document.body || el === document.documentElement)) {
(el as HTMLElement).style.setProperty('position', 'static', 'important');
fixed++;
}
}
// Remove blur/filter effects (paywalls often blur the content)
const blurred = document.querySelectorAll('[style*="blur"], [style*="filter"]');
blurred.forEach(el => {
const s = (el as HTMLElement).style;
if (s.filter?.includes('blur') || s.webkitFilter?.includes('blur')) {
s.setProperty('filter', 'none', 'important');
s.setProperty('-webkit-filter', 'none', 'important');
fixed++;
}
});
// Remove max-height truncation (article truncation)
const truncated = document.querySelectorAll('[class*="truncat"], [class*="preview"], [class*="teaser"]');
truncated.forEach(el => {
const s = getComputedStyle(el);
if (s.maxHeight && s.maxHeight !== 'none' && parseInt(s.maxHeight) < 500) {
(el as HTMLElement).style.setProperty('max-height', 'none', 'important');
(el as HTMLElement).style.setProperty('overflow', 'visible', 'important');
fixed++;
}
});
return fixed;
});
if (scrollFixed > 0) removed.push('scroll unlocked');
// Remove "ADVERTISEMENT" / "Article continues below" text labels
const adLabelCount = await page.evaluate(() => {
let removed = 0;
const adTextPatterns = [
/^advertisement$/i, /^sponsored$/i, /^promoted$/i,
/article continues/i, /continues below/i,
/^ad$/i, /^paid content$/i, /^partner content$/i,
];
// Walk text-heavy small elements looking for ad labels
const candidates = document.querySelectorAll('div, span, p, figcaption, label');
for (const el of candidates) {
const text = (el.textContent || '').trim();
if (text.length > 50) continue; // Too much text, probably real content
if (adTextPatterns.some(p => p.test(text))) {
// Also hide the parent if it's a wrapper with little else
const parent = el.parentElement;
if (parent && (parent.textContent || '').trim().length < 80) {
(parent as HTMLElement).style.setProperty('display', 'none', 'important');
} else {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
}
removed++;
}
}
return removed;
});
if (adLabelCount > 0) removed.push(`${adLabelCount} ad labels`);
// Remove empty ad placeholder whitespace (divs that are now empty after ad removal)
const collapsedCount = await page.evaluate(() => {
let collapsed = 0;
const candidates = document.querySelectorAll(
'div[class*="ad"], div[id*="ad"], aside[class*="ad"], div[class*="sidebar"], ' +
'div[class*="rail"], div[class*="right-col"], div[class*="widget"]'
);
for (const el of candidates) {
const rect = el.getBoundingClientRect();
// If the element has significant height but no visible text content, collapse it
if (rect.height > 50 && rect.width > 0) {
const text = (el.textContent || '').trim();
const images = el.querySelectorAll('img:not([src*="logo"]):not([src*="icon"])');
const links = el.querySelectorAll('a');
// Empty or mostly empty: collapse
if (text.length < 20 && images.length === 0 && links.length < 2) {
(el as HTMLElement).style.setProperty('display', 'none', 'important');
collapsed++;
}
}
}
return collapsed;
});
if (collapsedCount > 0) removed.push(`${collapsedCount} empty placeholders`);
if (removed.length === 0) return 'No clutter elements found to remove.';
return `Cleaned up: ${removed.join(', ')}`;
}
case 'prettyscreenshot': {
// Parse flags
let scrollTo: string | undefined;
let doCleanup = false;
const hideSelectors: string[] = [];
let viewportWidth: number | undefined;
let outputPath: string | undefined;
for (let i = 0; i < args.length; i++) {
if (args[i] === '--scroll-to' && i + 1 < args.length) {
scrollTo = args[++i];
} else if (args[i] === '--cleanup') {
doCleanup = true;
} else if (args[i] === '--hide' && i + 1 < args.length) {
// Collect all following non-flag args as selectors to hide
i++;
while (i < args.length && !args[i].startsWith('--')) {
hideSelectors.push(args[i]);
i++;
}
i--; // Back up since the for loop will increment
} else if (args[i] === '--width' && i + 1 < args.length) {
viewportWidth = parseInt(args[++i], 10);
if (isNaN(viewportWidth)) throw new Error('--width must be a number');
} else if (!args[i].startsWith('--')) {
outputPath = args[i];
} else {
throw new Error(`Unknown prettyscreenshot flag: ${args[i]}`);
}
}
// Default output path
if (!outputPath) {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, 19);
outputPath = `${TEMP_DIR}/browse-pretty-${timestamp}.png`;
}
validateOutputPath(outputPath);
const originalViewport = page.viewportSize();
// Set viewport width if specified
if (viewportWidth && originalViewport) {
await page.setViewportSize({ width: viewportWidth, height: originalViewport.height });
}
// Run cleanup if requested
if (doCleanup) {
const allSelectors = [
...CLEANUP_SELECTORS.ads,
...CLEANUP_SELECTORS.cookies,
...CLEANUP_SELECTORS.social,
];
await page.evaluate((sels: string[]) => {
for (const sel of sels) {
try {
document.querySelectorAll(sel).forEach(el => {
(el as HTMLElement).style.display = 'none';
});
} catch (err: any) {
if (!(err instanceof DOMException)) throw err;
}
}
// Also hide fixed/sticky (except nav)
for (const el of document.querySelectorAll('*')) {
const style = getComputedStyle(el);
if (style.position === 'fixed' || style.position === 'sticky') {
const tag = el.tagName.toLowerCase();
if (tag === 'nav' || tag === 'header') continue;
if (el.getAttribute('role') === 'navigation') continue;
(el as HTMLElement).style.display = 'none';
}
}
}, allSelectors);
}
// Hide specific elements
if (hideSelectors.length > 0) {
await page.evaluate((sels: string[]) => {
for (const sel of sels) {
try {
document.querySelectorAll(sel).forEach(el => {
(el as HTMLElement).style.display = 'none';
});
} catch (err: any) {
if (!(err instanceof DOMException)) throw err;
}
}
}, hideSelectors);
}
// Scroll to target
if (scrollTo) {
// Try as CSS selector first, then as text content
const scrolled = await page.evaluate((target: string) => {
// Try CSS selector
let el = document.querySelector(target);
if (el) {
el.scrollIntoView({ behavior: 'instant', block: 'center' });
return true;
}
// Try text match
const walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
null,
);
let node: Node | null;
while ((node = walker.nextNode())) {
if (node.textContent?.includes(target)) {
const parent = node.parentElement;
if (parent) {
parent.scrollIntoView({ behavior: 'instant', block: 'center' });
return true;
}
}
}
return false;
}, scrollTo);
if (!scrolled) {
// Restore viewport before throwing
if (viewportWidth && originalViewport) {
await page.setViewportSize(originalViewport);
}
throw new Error(`Could not find element or text to scroll to: ${scrollTo}`);
}
// Brief wait for scroll to settle
await page.waitForTimeout(300);
}
// Take screenshot
await page.screenshot({ path: outputPath, fullPage: !scrollTo });
// Restore viewport
if (viewportWidth && originalViewport) {
await page.setViewportSize(originalViewport);
}
const parts = ['Screenshot saved'];
if (doCleanup) parts.push('(cleaned)');
if (scrollTo) parts.push(`(scrolled to: ${scrollTo})`);
parts.push(`: ${outputPath}`);
return parts.join(' ');
}
case 'download': {
if (args.length === 0) throw new Error('Usage: download <url|@ref> [path] [--base64]');
const isBase64 = args.includes('--base64');
const filteredArgs = args.filter(a => a !== '--base64');
let url = filteredArgs[0];
const outputPath = filteredArgs[1];
// Resolve @ref to element src
if (url.startsWith('@')) {
const resolved = await bm.resolveRef(url);
if (!('locator' in resolved)) throw new Error(`Expected @ref, got CSS selector: ${url}`);
const locator = resolved.locator;
const tagName = await locator.evaluate(el => el.tagName.toLowerCase());
if (tagName === 'img') {
url = await locator.evaluate(el => {
const img = el as HTMLImageElement;
return img.currentSrc || img.src || img.getAttribute('data-src') || '';
});
} else if (tagName === 'video') {
url = await locator.evaluate(el => (el as HTMLVideoElement).currentSrc || (el as HTMLVideoElement).src || '');
} else if (tagName === 'audio') {
url = await locator.evaluate(el => (el as HTMLAudioElement).currentSrc || (el as HTMLAudioElement).src || '');
} else {
// Try src attribute on any element
url = await locator.evaluate(el => el.getAttribute('src') || '');
}
if (!url) throw new Error(`Could not extract URL from ${filteredArgs[0]} (${tagName})`);
}
// Check for HLS/DASH
if (url.includes('.m3u8') || url.includes('.mpd')) {
throw new Error('This is an HLS/DASH stream. Use yt-dlp or ffmpeg for adaptive stream downloads.');
}
// Determine output path and extension
const page = bm.getPage();
let contentType = 'application/octet-stream';
let buffer: Buffer;
if (url.startsWith('blob:')) {
// Strategy 3: Blob URL -- in-page fetch + base64
const dataUrl = await page.evaluate(async (blobUrl) => {
try {
const resp = await fetch(blobUrl);
const blob = await resp.blob();
if (blob.size > 100 * 1024 * 1024) return 'ERROR:TOO_LARGE';
return new Promise<string>((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => resolve(reader.result as string);
reader.onerror = () => reject('Failed to read blob');
reader.readAsDataURL(blob);
});
} catch (err: any) {
return `ERROR:EXPIRED:${err?.message || 'unknown'}`;
}
}, url);
if (dataUrl === 'ERROR:TOO_LARGE') throw new Error('Blob too large (>100MB). Use a different approach.');
if (dataUrl.startsWith('ERROR:EXPIRED')) throw new Error(`Blob URL expired or inaccessible: ${dataUrl.slice('ERROR:EXPIRED:'.length)}`);
const match = dataUrl.match(/^data:([^;]+);base64,(.+)$/);
if (!match) throw new Error('Failed to decode blob data');
contentType = match[1];
buffer = Buffer.from(match[2], 'base64');
} else {
// Strategy 1: Direct URL via page.request.fetch()
const response = await page.request.fetch(url, { timeout: 30000 });
const status = response.status();
if (status >= 400) {
throw new Error(`Download failed: HTTP ${status} ${response.statusText()}`);
}
contentType = response.headers()['content-type'] || 'application/octet-stream';
buffer = Buffer.from(await response.body());
if (buffer.length > 200 * 1024 * 1024) {
throw new Error('File too large (>200MB).');
}
}
// --base64 mode: return inline
if (isBase64) {
if (buffer.length > 10 * 1024 * 1024) {
throw new Error('File too large for --base64 (>10MB). Use disk download + GET /file instead.');
}
const mimeType = contentType.split(';')[0].trim();
return `data:${mimeType};base64,${buffer.toString('base64')}`;
}
// Write to disk
const ext = contentType.split(';')[0].includes('/')
? mimeToExt(contentType.split(';')[0].trim())
: '.bin';
const destPath = outputPath || path.join(TEMP_DIR, `browse-download-${Date.now()}${ext}`);
validateOutputPath(destPath);
fs.writeFileSync(destPath, buffer);
const sizeKB = Math.round(buffer.length / 1024);
return `Downloaded: ${destPath} (${sizeKB}KB, ${contentType.split(';')[0].trim()})`;
}
case 'scrape': {
if (args.length === 0) throw new Error('Usage: scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]');
const mediaType = args[0];
if (!['images', 'videos', 'media'].includes(mediaType)) {
throw new Error(`Invalid type: ${mediaType}. Use: images, videos, or media`);
}
// Parse flags
const selectorIdx = args.indexOf('--selector');
const selector = selectorIdx >= 0 ? args[selectorIdx + 1] : undefined;
const dirIdx = args.indexOf('--dir');
const dir = dirIdx >= 0 ? args[dirIdx + 1] : path.join(TEMP_DIR, `browse-scrape-${Date.now()}`);
const limitIdx = args.indexOf('--limit');
const limit = Math.min(limitIdx >= 0 ? parseInt(args[limitIdx + 1], 10) || 50 : 50, 200);
validateOutputPath(dir);
fs.mkdirSync(dir, { recursive: true });
const { extractMedia } = await import('./media-extract');
const target = bm.getActiveFrameOrPage();
const filter = mediaType === 'images' ? 'images' as const
: mediaType === 'videos' ? 'videos' as const
: undefined;
const mediaResult = await extractMedia(target, { selector, filter });
// Collect URLs to download
const urls: Array<{ url: string; type: string }> = [];
const seen = new Set<string>();
for (const img of mediaResult.images) {
const url = img.currentSrc || img.src || img.dataSrc;
if (url && !seen.has(url) && !url.startsWith('data:')) {
seen.add(url);
urls.push({ url, type: 'image' });
}
}
for (const vid of mediaResult.videos) {
const url = vid.currentSrc || vid.src;
if (url && !seen.has(url) && !url.startsWith('blob:') && !vid.isHLS && !vid.isDASH) {
seen.add(url);
urls.push({ url, type: 'video' });
}
}
for (const bg of mediaResult.backgroundImages) {
if (bg.url && !seen.has(bg.url)) {
seen.add(bg.url);
urls.push({ url: bg.url, type: 'image' });
}
}
const toDownload = urls.slice(0, limit);
const page = bm.getPage();
const manifest: any = {
url: page.url(),
scraped_at: new Date().toISOString(),
files: [] as any[],
total_size: 0,
succeeded: 0,
failed: 0,
};
const lines: string[] = [];
for (let i = 0; i < toDownload.length; i++) {
const { url, type } = toDownload[i];
try {
const response = await page.request.fetch(url, { timeout: 30000 });
if (response.status() >= 400) throw new Error(`HTTP ${response.status()}`);
const ct = response.headers()['content-type'] || 'application/octet-stream';
const ext = mimeToExt(ct.split(';')[0].trim());
const filename = `${type}-${String(i + 1).padStart(3, '0')}${ext}`;
const filePath = path.join(dir, filename);
const body = Buffer.from(await response.body());
try {
fs.writeFileSync(filePath, body);
} catch (writeErr: any) {
throw new Error(`Disk write failed: ${writeErr.message}`);
}
manifest.files.push({ path: filename, src: url, size: body.length, type: ct.split(';')[0].trim() });
manifest.total_size += body.length;
manifest.succeeded++;
lines.push(` [${i + 1}/${toDownload.length}] ${filename} (${Math.round(body.length / 1024)}KB)`);
} catch (err: any) {
manifest.files.push({ path: null, src: url, size: 0, type: '', error: err.message });
manifest.failed++;
lines.push(` [${i + 1}/${toDownload.length}] FAILED: ${err.message}`);
}
// 100ms delay between downloads
if (i < toDownload.length - 1) await new Promise(r => setTimeout(r, 100));
}
// Write manifest
fs.writeFileSync(path.join(dir, 'manifest.json'), JSON.stringify(manifest, null, 2));
return `Scraped ${toDownload.length} items to ${dir}/\n${lines.join('\n')}\n\nSummary: ${manifest.succeeded} succeeded, ${manifest.failed} failed, ${Math.round(manifest.total_size / 1024)}KB total`;
}
case 'archive': {
const page = bm.getPage();
const outputPath = args[0] || path.join(TEMP_DIR, `browse-archive-${Date.now()}.mhtml`);
validateOutputPath(outputPath);
try {
const cdp = await page.context().newCDPSession(page);
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
await cdp.detach();
fs.writeFileSync(outputPath, data);
return `Archive saved: ${outputPath} (${Math.round(data.length / 1024)}KB, MHTML)`;
} catch (err: any) {
throw new Error(`MHTML archive requires Chromium CDP. Use 'text' or 'html' for raw page content. (${err.message})`);
}
}
default:
throw new Error(`Unknown write command: ${command}`);
}
}
/** Map MIME type to file extension. */
function mimeToExt(mime: string): string {
const map: Record<string, string> = {
'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif',
'image/webp': '.webp', 'image/svg+xml': '.svg', 'image/avif': '.avif',
'video/mp4': '.mp4', 'video/webm': '.webm', 'video/quicktime': '.mov',
'audio/mpeg': '.mp3', 'audio/wav': '.wav', 'audio/ogg': '.ogg',
'application/pdf': '.pdf', 'application/json': '.json',
'text/html': '.html', 'text/plain': '.txt',
};
return map[mime] || '.bin';
}