mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
c15b805cd8
* feat(browse): TabSession loadedHtml + command aliases + DX polish primitives
Adds the foundation layer for Puppeteer-parity features:
- TabSession.loadedHtml + setTabContent/getLoadedHtml/clearLoadedHtml —
enables load-html content to survive context recreation (viewport --scale)
via in-memory replay. ASCII lifecycle diagram in the source explains the
clear-before-navigation contract.
- COMMAND_ALIASES + canonicalizeCommand() helper — single source of truth
for name aliases (setcontent / set-content / setContent → load-html),
consumed by server dispatch and chain prevalidation.
- buildUnknownCommandError() pure function — rich error messages with
Levenshtein-based "Did you mean" suggestions (distance ≤ 2, input
length ≥ 4 to skip 2-letter noise) and NEW_IN_VERSION upgrade hints.
- load-html registered in WRITE_COMMANDS + SCOPE_WRITE so scoped write
tokens can use it.
- screenshot and viewport descriptions updated for upcoming flags.
- New browse/test/dx-polish.test.ts (15 tests): alias canonicalization,
Levenshtein threshold + alphabetical tiebreak, short-input guard,
NEW_IN_VERSION upgrade hint, alias + scope integration invariants.
No consumers yet — pure additive foundation. Safe to bisect on its own.
* feat(browse): accept file:// in goto with smart cwd/home-relative parsing
Extends validateNavigationUrl to accept file:// URLs scoped to safe dirs
(cwd + TEMP_DIR) via the existing validateReadPath policy. The workhorse is a
new normalizeFileUrl() helper that handles non-standard relative forms BEFORE
the WHATWG URL parser sees them:
file:///abs/path.html → unchanged
file://./docs/page.html → file://<cwd>/docs/page.html
file://~/Documents/page.html → file://<HOME>/Documents/page.html
file://docs/page.html → file://<cwd>/docs/page.html
file://localhost/abs/path → unchanged
file://host.example.com/... → rejected (UNC/network)
file:// and file:/// → rejected (would list a directory)
Host heuristic rejects segments with '.', ':', '\\', '%', IPv6 brackets, or
Windows drive-letter patterns — so file://docs.v1/page.html, file://127.0.0.1/x,
file://[::1]/x, and file://C:/Users/x are explicit errors.
Uses fileURLToPath() + pathToFileURL() from node:url (never string-concat) so
URL escapes like %20 decode correctly and Node rejects encoded-slash traversal
(%2F..%2F) outright.
Signature change: validateNavigationUrl now returns Promise<string> (the
normalized URL) instead of Promise<void>. Existing callers that ignore the
return value still compile — they just don't benefit from smart-parsing until
updated in follow-up commits. Callers will be migrated in the next few commits
(goto, diff, newTab, restoreState).
Rewrites the url-validation test file: updates existing tests for the new
return type, adds 20+ new tests covering every normalizeFileUrl shape variant,
URL-encoding edge cases, and path-traversal rejection.
References: codex consult v3 P1 findings on URL parser semantics and fileURLToPath.
* feat(browse): BrowserManager deviceScaleFactor + setContent replay + file:// plumbing
Three tightly-coupled changes to BrowserManager, all in service of the
Puppeteer-parity workflow:
1. deviceScaleFactor + currentViewport tracking. New private fields (default
scale=1, viewport=1280x720) + setDeviceScaleFactor(scale, w, h) method.
deviceScaleFactor is a context-level Playwright option — changing it
requires recreateContext(). The method validates (finite number, 1-3 cap,
headed-mode rejected), stores new values, calls recreateContext(), and
rolls back the fields on failure so a bad call doesn't leave inconsistent
state. Context options at all three sites (launch, recreate happy path,
recreate fallback) now honor the stored values instead of hardcoding
1280x720.
2. BrowserState.loadedHtml + loadedHtmlWaitUntil. saveState captures per-tab
loadedHtml from the session; restoreState replays it via newSession.
setTabContent() — NOT bare page.setContent() — so TabSession.loadedHtml
is rehydrated and survives *subsequent* scale changes. In-memory only,
never persisted to disk (HTML may contain secrets or customer data).
3. newTab + restoreState now consume validateNavigationUrl's normalized
return value. file://./x, file://~/x, and bare-segment forms now take
effect at every navigation site, not just the top-level goto command.
Together these enable: load-html → viewport --scale 2 → viewport --scale 1.5
→ screenshot, with content surviving both context recreations. Codex v2 P0
flagged that bare page.setContent in restoreState would lose content on the
second scale change — this commit implements the rehydration path.
References: codex v2 P0 (TabSession rehydration), codex v3 P1 (4-caller
return value), plan Feature 3 + Feature 4.
* feat(browse): load-html, screenshot --selector, viewport --scale, alias dispatch
Wires the new handlers and dispatch logic that the previous commits made
possible:
write-commands.ts
- New 'load-html' case: validateReadPath for safe-dir scoping, stat-based
actionable errors (not found, directory, oversize), extension allowlist
(.html/.htm/.xhtml/.svg), magic-byte sniff with UTF-8 BOM strip accepting
any <[a-zA-Z!?] markup opener (not just <!doctype — bare fragments like
<div>...</div> work for setContent), 50MB cap via GSTACK_BROWSE_MAX_HTML_BYTES
override, frame-context rejection. Calls session.setTabContent() so replay
metadata is rehydrated.
- viewport command extended: optional [<WxH>], optional [--scale <n>],
scale-only variant reads current size via page.viewportSize(). Invalid
scale (NaN, Infinity, empty, out of 1-3) throws with named value. Headed
mode rejected explicitly.
- clearLoadedHtml() called BEFORE goto/back/forward/reload navigation
(not after) so a timed-out goto post-commit doesn't leave stale metadata
that could resurrect on a later context recreation. Codex v2 P1 catch.
- goto uses validateNavigationUrl's normalized return value.
meta-commands.ts
- screenshot --selector <css> flag: explicit element-screenshot form.
Rejects alongside positional selector (both = error), preserves --clip
conflict at line 161, composes with --base64 at lines 168-174.
- chain canonicalizes each step with canonicalizeCommand — step shape is
now { rawName, name, args } so prevalidation, dispatch, WRITE_COMMANDS.has,
watch blocking, and result labels all use canonical names while audit
labels show 'rawName→name' when aliased. Codex v3 P2 catch — prior shape
only canonicalized at prevalidation and diverged everywhere else.
- diff command consumes validateNavigationUrl return value for both URLs.
server.ts
- Command canonicalization inserted immediately after parse, before scope /
watch / tab-ownership / content-wrapping checks. rawCommand preserved for
future audit (not wired into audit log in this commit — follow-up).
- Unknown-command handler replaced with buildUnknownCommandError() from
commands.ts — produces 'Unknown command: X. Did you mean Y?' with optional
upgrade hint for NEW_IN_VERSION entries.
security-audit-r2.test.ts
- Updated chain-loop marker from 'for (const cmd of commands)' to
'for (const c of commands)' to match the new chain step shape. Same
isWatching + BLOCKED invariants still asserted.
* chore: bump version and changelog (v1.1.0.0)
- VERSION: 1.0.0.0 → 1.1.0.0 (MINOR bump — new user-facing commands)
- package.json: matching version bump
- CHANGELOG.md: new 1.1.0.0 entry describing load-html, screenshot --selector,
viewport --scale, file:// support, setContent replay, and DX polish in user
voice with a dedicated Security section for file:// safe-dirs policy
- browse/SKILL.md.tmpl: adds pattern #12 "Render local HTML", pattern #13
"Retina screenshots", and a full Puppeteer → browse cheatsheet with side-by-
side API mapping and a worked tweet-renderer migration example
- browse/SKILL.md + SKILL.md: regenerated from templates via `bun run gen:skill-docs`
to reflect the new command descriptions
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: pre-landing review fixes (9 findings from specialist + adversarial review)
Adversarial review (Claude subagent + Codex) surfaced 9 bugs across
CRITICAL/HIGH severity. All fixed:
1. tab-session.ts:setTabContent — state mutation moved AFTER the setContent
await. Prior order left phantom HTML in replay metadata if setContent
threw (timeout, browser crash), which a later viewport --scale would
silently replay. Now loadedHtml is only recorded on successful load.
2. browser-manager.ts:setDeviceScaleFactor — rollback now forces a second
recreateContext after restoring the old fields. The fallback path in
the original recreateContext builds a blank context using whatever
this.deviceScaleFactor/currentViewport hold at that moment (which were
the NEW values we were trying to apply). Rolling back the fields without
a second recreate left the live context at new-scale while state tracked
old-scale. Now: restore fields, force re-recreate with old values, only
if that ALSO fails do we return a combined error.
3. commands.ts:buildUnknownCommandError — Levenshtein tiebreak simplified
to 'd <= 2 && d < bestDist' (strict less). Candidates are pre-sorted
alphabetically, so first equal-distance wins by default. The prior
'(d === bestDist && best !== undefined && cand < best)' clause was dead
code.
4. tab-session.ts:onMainFrameNavigated — now clears loadedHtml, not just
refs + frame. Without this, a user who load-html'd then clicked a link
(or had a form submit / JS redirect / OAuth flow) would retain the stale
replay metadata. The next viewport --scale would silently revert the
tab to the ORIGINAL loaded HTML, losing whatever the post-navigation
content was. Silent data corruption. Browser-emitted navigations trigger
this path via wirePageEvents.
5. browser-manager.ts:saveState + restoreState — tab ownership now flows
through BrowserState.owner. Without this, a scoped agent's viewport
--scale would strand them: tab IDs change during recreate, ownership
map held stale IDs, owner lookup failed. New IDs had no owner, so
writes without tabId were denied (DoS). Worse, if the agent sent a
stale tabId the server's swallowed-tab-switch-error path would let the
command hit whatever tab was currently active (cross-tab authz bypass).
Now: clear ownership before restore, re-add per-tab with new IDs.
6. meta-commands.ts:state load — disk-loaded state.pages is now explicit
allowlist (url, isActive, storage:null) instead of object spread.
Spreading accepted loadedHtml, loadedHtmlWaitUntil, and owner from a
user-writable state file, letting a tampered state.json smuggle HTML
past load-html's safe-dirs / extension / magic-byte / 50MB-cap
validators, or forge tab ownership. Now stripped at the boundary.
7. url-validation.ts:normalizeFileUrl — preserves query string + fragment
across normalization. file://./app.html?route=home#login previously
resolved to a filesystem path that URL-encoded '?' as %3F and '#' as
%23, or (for absolute forms) pathToFileURL dropped them entirely. SPAs
and fixture URLs with query params 404'd or loaded the wrong route.
Now: split on ?/# before path resolution, reattach after.
8. url-validation.ts:validateNavigationUrl — reattaches parsed.search +
parsed.hash to the normalized file:// URL. Same fix at the main
validator for absolute paths that go through fileURLToPath round-trip.
9. server.ts:writeAuditEntry — audit entries now include aliasOf when the
user typed an alias ('setcontent' → cmd: 'load-html', aliasOf:
'setcontent'). Previously the isAliased variable was computed but
dropped, losing the raw input from the forensic trail. Completes the
plan's codex v3 P2 requirement.
Also added bm.getCurrentViewport() and switched 'viewport --scale'-
without-size to read from it (more reliable than page.viewportSize() on
headed/transition contexts).
Tests pass: exit 0, no failures. Build clean.
* test: integration coverage for load-html, screenshot --selector, viewport --scale, replay, aliases
Adds 28 Playwright-integration tests that close the coverage gap flagged
by the ship-workflow coverage audit (50% → expected ~80%+).
**load-html (12 tests):**
- happy path loads HTML file, page text matches
- bare HTML fragments (<div>...</div>) accepted, not just full documents
- missing file arg throws usage
- non-.html extension rejected by allowlist
- /etc/passwd.html rejected by safe-dirs policy
- ENOENT path rejected with actionable "not found" error
- directory target rejected
- binary file (PNG magic bytes) disguised as .html rejected by magic-byte check
- UTF-8 BOM stripped before magic-byte check — BOM-prefixed HTML accepted
- --wait-until networkidle exercises non-default branch
- invalid --wait-until value rejected
- unknown flag rejected
**screenshot --selector (5 tests):**
- --selector flag captures element, validates Screenshot saved (element)
- conflicts with positional selector (both = error)
- conflicts with --clip (mutually exclusive)
- composes with --base64 (returns data:image/png;base64,...)
- missing value throws usage
**viewport --scale (5 tests):**
- WxH --scale 2 produces PNG with 2x element dimensions (parses IHDR bytes 16-23)
- --scale without WxH keeps current size + applies scale
- non-finite value (abc) throws "not a finite number"
- out-of-range (4, 0.5) throws "between 1 and 3"
- missing value throws
**setContent replay across context recreation (3 tests):**
- load-html → viewport --scale 2: content survives (hits setTabContent replay path)
- double cycle 2x → 1.5x: content still survives (proves TabSession rehydration)
- goto after load-html clears replay: subsequent viewport --scale does NOT
resurrect the stale HTML (validates the onMainFrameNavigated fix)
**Command aliases (2 tests):**
- setcontent routes to load-html via chain canonicalization
- set-content (hyphenated) also routes — both end-to-end through chain dispatch
Fixture paths use /tmp (SAFE_DIRECTORIES entry) instead of $TMPDIR which is
/var/folders/... on macOS and outside the safe-dirs boundary. Chain result
labels use rawName→name format when an alias is resolved (matches the
meta-commands.ts chain refactor).
Full suite: exit 0, 223/223 pass.
* docs: update BROWSER.md + CHANGELOG for v1.1.0.0
BROWSER.md:
- Command reference table updated: goto now lists file:// support,
load-html added to Navigate row, viewport flagged with --scale
option, screenshot row shows --selector + --base64 flags
- Screenshot modes table adds the fifth mode (element crop via
--selector flag) and notes the tag-selector-not-caught-positionally
gotcha
- New "Retina screenshots — viewport --scale" subsection explains
deviceScaleFactor mechanics, context recreation side effects, and
headed-mode rejection
- New "Loading local HTML — goto file:// vs load-html" subsection
explains the two paths, their tradeoffs (URL state, relative asset
resolution), the safe-dirs policy, extension allowlist + magic-byte
sniff, 50MB cap, setContent replay across recreateContext, and the
alias routing (setcontent → load-html before scope check)
CHANGELOG.md (v1.1.0.0 security section expanded, no existing content
removed):
- State files cannot smuggle HTML or forge tab ownership (allowlist
on disk-loaded page fields)
- Audit log records aliasOf when a canonical command was reached via
an alias (setcontent → load-html)
- load-html content clears on real navigations (clicks, form submits,
JS redirects) — not just explicit goto. Also notes SPA query/fragment
preservation for goto file://
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
300 lines
12 KiB
TypeScript
300 lines
12 KiB
TypeScript
/**
|
|
* URL validation for navigation commands — blocks dangerous schemes and cloud metadata endpoints.
|
|
* Localhost and private IPs are allowed (primary use case: QA testing local dev servers).
|
|
*/
|
|
|
|
import { fileURLToPath, pathToFileURL } from 'node:url';
|
|
import * as path from 'node:path';
|
|
import * as os from 'node:os';
|
|
import { validateReadPath } from './path-security';
|
|
|
|
export const BLOCKED_METADATA_HOSTS = new Set([
|
|
'169.254.169.254', // AWS/GCP/Azure instance metadata
|
|
'fe80::1', // IPv6 link-local — common metadata endpoint alias
|
|
'::ffff:169.254.169.254', // IPv4-mapped IPv6 form of the metadata IP
|
|
'::ffff:a9fe:a9fe', // Hex-encoded IPv4-mapped form (URL constructor normalizes to this)
|
|
'::a9fe:a9fe', // Deprecated IPv4-compatible hex form
|
|
'metadata.google.internal', // GCP metadata
|
|
'metadata.azure.internal', // Azure IMDS
|
|
]);
|
|
|
|
/**
|
|
* IPv6 prefixes to block (CIDR-style). Any address starting with these
|
|
* hex prefixes is rejected. Covers the full ULA range (fc00::/7 = fc00:: and fd00::).
|
|
*/
|
|
const BLOCKED_IPV6_PREFIXES = ['fc', 'fd'];
|
|
|
|
/**
|
|
* Check if an IPv6 address falls within a blocked prefix range.
|
|
* Handles the full ULA range (fc00::/7), not just the exact literal fd00::.
|
|
* Only matches actual IPv6 addresses (must contain ':'), not hostnames
|
|
* like fd.example.com or fcustomer.com.
|
|
*/
|
|
function isBlockedIpv6(addr: string): boolean {
|
|
const normalized = addr.toLowerCase().replace(/^\[|\]$/g, '');
|
|
// Must contain a colon to be an IPv6 address — avoids false positives on
|
|
// hostnames like fd.example.com or fcustomer.com
|
|
if (!normalized.includes(':')) return false;
|
|
return BLOCKED_IPV6_PREFIXES.some(prefix => normalized.startsWith(prefix));
|
|
}
|
|
|
|
/**
|
|
* Normalize hostname for blocklist comparison:
|
|
* - Strip trailing dot (DNS fully-qualified notation)
|
|
* - Strip IPv6 brackets (URL.hostname includes [] for IPv6)
|
|
* - Resolve hex (0xA9FEA9FE) and decimal (2852039166) IP representations
|
|
*/
|
|
function normalizeHostname(hostname: string): string {
|
|
// Strip IPv6 brackets
|
|
let h = hostname.startsWith('[') && hostname.endsWith(']')
|
|
? hostname.slice(1, -1)
|
|
: hostname;
|
|
// Strip trailing dot
|
|
if (h.endsWith('.')) h = h.slice(0, -1);
|
|
return h;
|
|
}
|
|
|
|
/**
|
|
* Check if a hostname resolves to the link-local metadata IP 169.254.169.254.
|
|
* Catches hex (0xA9FEA9FE), decimal (2852039166), and octal (0251.0376.0251.0376) forms.
|
|
*/
|
|
function isMetadataIp(hostname: string): boolean {
|
|
// Try to parse as a numeric IP via URL constructor — it normalizes all forms
|
|
try {
|
|
const probe = new URL(`http://${hostname}`);
|
|
const normalized = probe.hostname;
|
|
if (BLOCKED_METADATA_HOSTS.has(normalized) || isBlockedIpv6(normalized)) return true;
|
|
// Also check after stripping trailing dot
|
|
if (normalized.endsWith('.') && BLOCKED_METADATA_HOSTS.has(normalized.slice(0, -1))) return true;
|
|
} catch {
|
|
// Not a valid hostname — can't be a metadata IP
|
|
}
|
|
return false;
|
|
}
|
|
|
|
/**
|
|
* Resolve a hostname to its IP addresses and check if any resolve to blocked metadata IPs.
|
|
* Mitigates DNS rebinding: even if the hostname looks safe, the resolved IP might not be.
|
|
*
|
|
* Checks both A (IPv4) and AAAA (IPv6) records — an attacker can use AAAA-only DNS to
|
|
* bypass IPv4-only checks. Each record family is tried independently; failure of one
|
|
* (e.g. no AAAA records exist) is not treated as a rebinding risk.
|
|
*/
|
|
async function resolvesToBlockedIp(hostname: string): Promise<boolean> {
|
|
try {
|
|
const dns = await import('node:dns');
|
|
const { resolve4, resolve6 } = dns.promises;
|
|
|
|
// Check IPv4 A records
|
|
const v4Check = resolve4(hostname).then(
|
|
(addresses) => addresses.some(addr => BLOCKED_METADATA_HOSTS.has(addr)),
|
|
() => false, // ENODATA / ENOTFOUND — no A records, not a risk
|
|
);
|
|
|
|
// Check IPv6 AAAA records — the gap that issue #668 identified
|
|
const v6Check = resolve6(hostname).then(
|
|
(addresses) => addresses.some(addr => {
|
|
const normalized = addr.toLowerCase();
|
|
return BLOCKED_METADATA_HOSTS.has(normalized) || isBlockedIpv6(normalized) ||
|
|
// fe80::/10 is link-local — always block (covers all fe80:: addresses)
|
|
normalized.startsWith('fe80:');
|
|
}),
|
|
() => false, // ENODATA / ENOTFOUND — no AAAA records, not a risk
|
|
);
|
|
|
|
const [v4Blocked, v6Blocked] = await Promise.all([v4Check, v6Check]);
|
|
return v4Blocked || v6Blocked;
|
|
} catch {
|
|
// Unexpected error — fail open (don't block navigation on DNS infrastructure failure)
|
|
return false;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Normalize non-standard file:// URLs into absolute form before the WHATWG URL parser
|
|
* sees them. Handles cwd-relative, home-relative, and bare-segment shapes that the
|
|
* standard parser would otherwise mis-interpret as hostnames.
|
|
*
|
|
* file:///abs/path.html → unchanged
|
|
* file://./<rel> → file://<cwd>/<rel>
|
|
* file://~/<rel> → file://<HOME>/<rel>
|
|
* file://<single-segment>/... → file://<cwd>/<single-segment>/... (cwd-relative)
|
|
* file://localhost/<abs> → unchanged
|
|
* file://<host-like>/... → unchanged (caller rejects via host heuristic)
|
|
*
|
|
* Rejects empty (file://) and root-only (file:///) URLs — these would silently
|
|
* trigger Chromium's directory listing, which is a different product surface.
|
|
*/
|
|
export function normalizeFileUrl(url: string): string {
|
|
if (!url.toLowerCase().startsWith('file:')) return url;
|
|
|
|
// Split off query + fragment BEFORE touching the path — SPAs + fixture URLs rely
|
|
// on these. path.resolve would URL-encode `?` and `#` as `%3F`/`%23` (and
|
|
// pathToFileURL drops them entirely), silently routing preview URLs to the
|
|
// wrong fixture. Extract, normalize the path, reattach at the end.
|
|
//
|
|
// Parse order: `?` before `#` per RFC 3986 — '?' in a fragment is literal.
|
|
// Find the FIRST `?` or `#`, whichever comes first, and take everything
|
|
// after (including the delimiter) as the trailing segment.
|
|
const qIdx = url.indexOf('?');
|
|
const hIdx = url.indexOf('#');
|
|
let delimIdx = -1;
|
|
if (qIdx >= 0 && hIdx >= 0) delimIdx = Math.min(qIdx, hIdx);
|
|
else if (qIdx >= 0) delimIdx = qIdx;
|
|
else if (hIdx >= 0) delimIdx = hIdx;
|
|
|
|
const pathPart = delimIdx >= 0 ? url.slice(0, delimIdx) : url;
|
|
const trailing = delimIdx >= 0 ? url.slice(delimIdx) : '';
|
|
|
|
const rest = pathPart.slice('file:'.length);
|
|
|
|
// file:/// or longer → standard absolute; pass through unchanged (caller validates path).
|
|
if (rest.startsWith('///')) {
|
|
// Reject bare root-only (file:/// with nothing after)
|
|
if (rest === '///' || rest === '////') {
|
|
throw new Error('Invalid file URL: file:/// has no path. Use file:///<absolute-path>.');
|
|
}
|
|
return pathPart + trailing;
|
|
}
|
|
|
|
// Everything else: must start with // (we accept file://... only)
|
|
if (!rest.startsWith('//')) {
|
|
throw new Error(`Invalid file URL: ${url}. Use file:///<absolute-path> or file://./<rel> or file://~/<rel>.`);
|
|
}
|
|
|
|
const afterDoubleSlash = rest.slice(2);
|
|
|
|
// Reject empty (file://) and trailing-slash-only (file://./ listing cwd).
|
|
if (afterDoubleSlash === '') {
|
|
throw new Error('Invalid file URL: file:// is empty. Use file:///<absolute-path>.');
|
|
}
|
|
if (afterDoubleSlash === '.' || afterDoubleSlash === './') {
|
|
throw new Error('Invalid file URL: file://./ would list the current directory. Use file://./<filename> to render a specific file.');
|
|
}
|
|
if (afterDoubleSlash === '~' || afterDoubleSlash === '~/') {
|
|
throw new Error('Invalid file URL: file://~/ would list the home directory. Use file://~/<filename> to render a specific file.');
|
|
}
|
|
|
|
// Home-relative: file://~/<rel>
|
|
if (afterDoubleSlash.startsWith('~/')) {
|
|
const rel = afterDoubleSlash.slice(2);
|
|
const absPath = path.join(os.homedir(), rel);
|
|
return pathToFileURL(absPath).href + trailing;
|
|
}
|
|
|
|
// cwd-relative with explicit ./ : file://./<rel>
|
|
if (afterDoubleSlash.startsWith('./')) {
|
|
const rel = afterDoubleSlash.slice(2);
|
|
const absPath = path.resolve(process.cwd(), rel);
|
|
return pathToFileURL(absPath).href + trailing;
|
|
}
|
|
|
|
// localhost host explicitly allowed: file://localhost/<abs> (pass through to standard parser).
|
|
if (afterDoubleSlash.toLowerCase().startsWith('localhost/')) {
|
|
return pathPart + trailing;
|
|
}
|
|
|
|
// Ambiguous: file://<segment>/<rest> — treat as cwd-relative ONLY if <segment> is a
|
|
// simple path name (no dots, no colons, no backslashes, no percent-encoding, no
|
|
// IPv6 brackets, no Windows drive letter pattern).
|
|
const firstSlash = afterDoubleSlash.indexOf('/');
|
|
const segment = firstSlash === -1 ? afterDoubleSlash : afterDoubleSlash.slice(0, firstSlash);
|
|
|
|
// Reject host-like segments: dotted names (docs.v1), IPs (127.0.0.1), IPv6 ([::1]),
|
|
// drive letters (C:), percent-encoded, or backslash paths.
|
|
const looksLikeHost = /[.:\\%]/.test(segment) || segment.startsWith('[');
|
|
if (looksLikeHost) {
|
|
throw new Error(
|
|
`Unsupported file URL host: ${segment}. Use file:///<absolute-path> for local files (network/UNC paths are not supported).`
|
|
);
|
|
}
|
|
|
|
// Simple-segment cwd-relative: file://docs/page.html → cwd/docs/page.html
|
|
const absPath = path.resolve(process.cwd(), afterDoubleSlash);
|
|
return pathToFileURL(absPath).href + trailing;
|
|
}
|
|
|
|
/**
|
|
* Validate a navigation URL and return a normalized version suitable for page.goto().
|
|
*
|
|
* Callers MUST use the return value — normalization of non-standard file:// forms
|
|
* only takes effect at the navigation site, not at the original URL.
|
|
*
|
|
* Callers (keep this list current, grep before removing):
|
|
* - write-commands.ts:goto
|
|
* - meta-commands.ts:diff (both URL args)
|
|
* - browser-manager.ts:newTab
|
|
* - browser-manager.ts:restoreState
|
|
*/
|
|
export async function validateNavigationUrl(url: string): Promise<string> {
|
|
// Normalize non-standard file:// shapes before the URL parser sees them.
|
|
let normalized = url;
|
|
if (url.toLowerCase().startsWith('file:')) {
|
|
normalized = normalizeFileUrl(url);
|
|
}
|
|
|
|
let parsed: URL;
|
|
try {
|
|
parsed = new URL(normalized);
|
|
} catch {
|
|
throw new Error(`Invalid URL: ${url}`);
|
|
}
|
|
|
|
// file:// path: validate against safe-dirs and allow; otherwise defer to http(s) logic.
|
|
if (parsed.protocol === 'file:') {
|
|
// Reject non-empty non-localhost hosts (UNC / network paths).
|
|
if (parsed.host !== '' && parsed.host.toLowerCase() !== 'localhost') {
|
|
throw new Error(
|
|
`Unsupported file URL host: ${parsed.host}. Use file:///<absolute-path> for local files.`
|
|
);
|
|
}
|
|
|
|
// Convert URL → filesystem path with proper decoding (handles %20, %2F, etc.)
|
|
// fileURLToPath strips query + hash; we reattach them after validation so SPA
|
|
// fixture URLs like file:///tmp/app.html?route=home#login survive intact.
|
|
let fsPath: string;
|
|
try {
|
|
fsPath = fileURLToPath(parsed);
|
|
} catch (e: any) {
|
|
throw new Error(`Invalid file URL: ${url} (${e.message})`);
|
|
}
|
|
|
|
// Reject path traversal after decoding — e.g. file:///tmp/safe%2F..%2Fetc/passwd
|
|
// Note: fileURLToPath doesn't collapse .., so a literal '..' in the decoded path
|
|
// is suspicious. path.resolve will normalize it; check the result against safe dirs.
|
|
validateReadPath(fsPath);
|
|
|
|
// Return the canonical file:// URL derived from the filesystem path + original
|
|
// query + hash. This guarantees page.goto() gets a well-formed URL regardless
|
|
// of input shape while preserving SPA route/query params.
|
|
return pathToFileURL(fsPath).href + parsed.search + parsed.hash;
|
|
}
|
|
|
|
if (parsed.protocol !== 'http:' && parsed.protocol !== 'https:') {
|
|
throw new Error(
|
|
`Blocked: scheme "${parsed.protocol}" is not allowed. Only http:, https:, and file: URLs are permitted.`
|
|
);
|
|
}
|
|
|
|
const hostname = normalizeHostname(parsed.hostname.toLowerCase());
|
|
|
|
if (BLOCKED_METADATA_HOSTS.has(hostname) || isMetadataIp(hostname) || isBlockedIpv6(hostname)) {
|
|
throw new Error(
|
|
`Blocked: ${parsed.hostname} is a cloud metadata endpoint. Access is denied for security.`
|
|
);
|
|
}
|
|
|
|
// DNS rebinding protection: resolve hostname and check if it points to metadata IPs.
|
|
// Skip for loopback/private IPs — they can't be DNS-rebinded and the async DNS
|
|
// resolution adds latency that breaks concurrent E2E tests under load.
|
|
const isLoopback = hostname === 'localhost' || hostname === '127.0.0.1' || hostname === '::1';
|
|
const isPrivateNet = /^(10\.|172\.(1[6-9]|2[0-9]|3[01])\.|192\.168\.)/.test(hostname);
|
|
if (!isLoopback && !isPrivateNet && await resolvesToBlockedIp(hostname)) {
|
|
throw new Error(
|
|
`Blocked: ${parsed.hostname} resolves to a cloud metadata IP. Possible DNS rebinding attack.`
|
|
);
|
|
}
|
|
|
|
return url;
|
|
}
|