v1.38.1.0 fix wave: surrogate-safe page captures (#1440), Implementation Tasks across review skills (#1454), root-level artifact patterns (#1452) (#1504)

* fix(browse): sanitize lone Unicode surrogates at commandResult chokepoint + /batch envelope (#1440)

Page captures with mixed-script Unicode round-trip cleanly to the Claude API.
Two new utilities in browse/src/sanitize.ts: stripLoneSurrogates for raw UTF-16
strings, stripLoneSurrogateEscapes for \uXXXX JSON escape text. sanitizeBody
picks the right pass based on cr.json.

buildCommandResponse is extracted from handleCommand (now exported) and
applies sanitization before new Response(). /batch was bypassing this
chokepoint via direct JSON.stringify, so it sanitizes each cr.result before
pushing AND wraps the envelope with stripLoneSurrogateEscapes. Defense in
depth wraps at getCleanText, getCleanTextWithStripping, html, accessibility,
and snapshot.ts return points so downstream consumers (datamarking, envelope
wrapping) see sanitized text before the response is built.

25 new unit tests across sanitize.test.ts and build-command-response.test.ts.
content-security.test.ts updated to accept either pre- or post-sanitize form
of the snapshot scoped branch (source-level regression check).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: bug fix wave v1.36.0.0 — Implementation Tasks, allowlist patterns, surrogate-safe page captures (#1440 #1452 #1454)

Three filed issues land together:

#1440 — Page captures from real-world HTML hit 'API Error 400: no low
surrogate in string'. Sanitizers + buildCommandResponse extraction shipped in
the prior commit; this commit adds the migration script that patches existing
brain-allowlist/privacy-map/gitattributes installs and the supporting tests.

#1452 — Federation sync was silently skipping root-level design and test-plan
docs. bin/gstack-artifacts-init adds two patterns to all three managed blocks
(.brain-allowlist, .brain-privacy-map.json, .gitattributes). Idempotent
migration v1.36.0.0.sh repairs existing installs in place via jq (preserves
JSON validity) — no commit + push from the migration.

#1454 — All four review skills (CEO/design/eng/DX) emit an Implementation
Tasks markdown section AND write a jq-built JSONL artifact per phase.
/autoplan reads all four files, scopes by current branch + 5-commit window,
dedupes on exact (component, sorted(files), title), and renders an aggregated
list in the Final Approval Gate.

New tests:
- browse/test/sanitize.test.ts (18 cases)
- browse/test/build-command-response.test.ts (7 cases)
- test/artifacts-init-migration.test.ts (7 cases)

VERSION → 1.36.0.0. Skips the v1.34.x slot taken by 'gstack consumable as
submodule' and the v1.35.0.0 slot taken by /document-generate. #1428 was
shipped separately by v1.34.2.0 with a different approach; follow-up #1503
filed for the bare-path filesystem boundary concern surfaced during our
analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: bump to v1.38.1.0

VERSION + package.json + CHANGELOG header + migration filename + test
reference all consistently at v1.38.1.0. Migration renamed:
gstack-upgrade/migrations/v1.38.0.0.sh -> v1.38.1.0.sh.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-14 21:46:50 -07:00
committed by GitHub
parent 3bf43766d5
commit ea51b45e08
27 changed files with 1248 additions and 19 deletions
+3 -1
View File
@@ -12,6 +12,7 @@
import { randomBytes } from 'crypto';
import type { Page, Frame } from 'playwright';
import { stripLoneSurrogates } from './sanitize';
// ─── Datamarking (Layer 1) ──────────────────────────────────────
@@ -167,7 +168,7 @@ export async function markHiddenElements(page: Page | Frame): Promise<string[]>
* Uses clone + remove approach: clones body, removes marked elements, returns innerText.
*/
export async function getCleanTextWithStripping(page: Page | Frame): Promise<string> {
return page.evaluate(() => {
const raw = await page.evaluate(() => {
const body = document.body;
if (!body) return '';
const clone = body.cloneNode(true) as HTMLElement;
@@ -181,6 +182,7 @@ export async function getCleanTextWithStripping(page: Page | Frame): Promise<str
.filter(line => line.length > 0)
.join('\n');
});
return stripLoneSurrogates(raw);
}
/**
+7 -5
View File
@@ -14,6 +14,7 @@ import * as path from 'path';
import { TEMP_DIR } from './platform';
import { inspectElement, formatInspectorResult, getModificationHistory } from './cdp-inspector';
import { validateReadPath } from './path-security';
import { stripLoneSurrogates } from './sanitize';
// Re-export for backward compatibility (tests import from read-commands)
export { validateReadPath } from './path-security';
@@ -50,7 +51,7 @@ function wrapForEvaluate(code: string): string {
* Exported for DRY reuse in meta-commands (diff).
*/
export async function getCleanText(page: Page | Frame): Promise<string> {
return page.evaluate(() => {
const raw = await page.evaluate(() => {
const body = document.body;
if (!body) return '';
const clone = body.cloneNode(true) as HTMLElement;
@@ -61,6 +62,7 @@ export async function getCleanText(page: Page | Frame): Promise<string> {
.filter(line => line.length > 0)
.join('\n');
});
return stripLoneSurrogates(raw);
}
/**
@@ -115,9 +117,9 @@ export async function handleReadCommand(
if (selector) {
const resolved = await session.resolveRef(selector);
if ('locator' in resolved) {
return resolved.locator.innerHTML({ timeout: 5000 });
return stripLoneSurrogates(await resolved.locator.innerHTML({ timeout: 5000 }));
}
return target.locator(resolved.selector).innerHTML({ timeout: 5000 });
return stripLoneSurrogates(await target.locator(resolved.selector).innerHTML({ timeout: 5000 }));
}
// page.content() is page-only; use evaluate for frame compat
const doctype = await target.evaluate(() => {
@@ -125,7 +127,7 @@ export async function handleReadCommand(
return dt ? `<!DOCTYPE ${dt.name}>` : '';
});
const html = await target.evaluate(() => document.documentElement.outerHTML);
return doctype ? `${doctype}\n${html}` : html;
return stripLoneSurrogates(doctype ? `${doctype}\n${html}` : html);
}
case 'links': {
@@ -173,7 +175,7 @@ export async function handleReadCommand(
case 'accessibility': {
const snapshot = await target.locator("body").ariaSnapshot();
return snapshot;
return stripLoneSurrogates(snapshot);
}
case 'js': {
+34
View File
@@ -0,0 +1,34 @@
// Lone Unicode surrogate sanitization.
//
// Lone surrogates (\uD800-\uDFFF without a matching pair) are valid UTF-16
// but invalid UTF-8, so JSON.stringify produces output the Claude API rejects
// with HTTP 400 "no low surrogate in string". Page captures from real-world
// HTML hit this when content contains broken emoji bytes or mid-emoji splits.
//
// Two sanitizers are needed because both forms appear in browse responses:
// - Raw UTF-16 surrogates in text/plain bodies (pre-stringify state).
// - JSON \uXXXX escape sequences after JSON.stringify already ran.
// Both replace lone surrogates with U+FFFD (replacement character).
const LONE_SURROGATE_HIGH = /[\uD800-\uDBFF](?![\uDC00-\uDFFF])/g;
const LONE_SURROGATE_LOW = /(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]/g;
export function stripLoneSurrogates(s: string): string {
return s.replace(LONE_SURROGATE_HIGH, '').replace(LONE_SURROGATE_LOW, '');
}
// Matches \uD8XX-\uDFXX escape text where the pair is not completed by an
// adjacent \uDC00-\uDFFF (high) or preceded by \uD800-\uDBFF (low).
const LONE_SURROGATE_HIGH_ESCAPE = /\\u[Dd][89ABab][0-9A-Fa-f]{2}(?!\\u[Dd][C-Fc-f][0-9A-Fa-f]{2})/g;
const LONE_SURROGATE_LOW_ESCAPE = /(?<!\\u[Dd][89ABab][0-9A-Fa-f]{2})\\u[Dd][C-Fc-f][0-9A-Fa-f]{2}/g;
export function stripLoneSurrogateEscapes(s: string): string {
return s.replace(LONE_SURROGATE_HIGH_ESCAPE, '\\uFFFD').replace(LONE_SURROGATE_LOW_ESCAPE, '\\uFFFD');
}
// Pick the right sanitizer based on whether the body has already been JSON-stringified.
// For application/json bodies, run both passes: raw first (in case the JSON encoder
// emitted surrogates as-is rather than escaping), then escape-text.
export function sanitizeBody(body: string, isJson: boolean): string {
return isJson ? stripLoneSurrogateEscapes(stripLoneSurrogates(body)) : stripLoneSurrogates(body);
}
+27 -7
View File
@@ -42,6 +42,7 @@ import { inspectElement, modifyStyle, resetModifications, getModificationHistory
// Bun.spawn used instead of child_process.spawn (compiled bun binaries
// fail posix_spawn on all executables including /bin/bash)
import { safeUnlink, safeUnlinkQuiet, safeKill } from './error-handling';
import { sanitizeBody, stripLoneSurrogateEscapes } from './sanitize';
import { startSocksBridge, testUpstream, type BridgeHandle } from './socks-bridge';
import { parseProxyConfig, toUpstreamConfig, ProxyConfigError } from './proxy-config';
import { redactProxyUrl } from './proxy-redact';
@@ -1079,16 +1080,28 @@ async function handleCommandInternal(
return { ...cr, result: sanitizeLoneSurrogates(cr.result) };
}
/** HTTP wrapper — converts CommandResult to Response */
async function handleCommand(body: any, tokenInfo?: TokenInfo | null): Promise<Response> {
const cr = await handleCommandInternal(body, tokenInfo);
/**
* Build the HTTP response from a CommandResult. Pure function so it can be
* unit-tested without spinning up the server (#1440). Defense in depth on top
* of handleCommandInternal's choke-point sanitization: this catches any
* \uXXXX JSON-escape surrogate forms that the raw-codepoint regex above
* misses when the body has already been JSON-stringified.
*/
export function buildCommandResponse(cr: CommandResult): Response {
const contentType = cr.json ? 'application/json' : 'text/plain';
return new Response(cr.result, {
const safeBody = typeof cr.result === 'string' ? sanitizeBody(cr.result, !!cr.json) : cr.result;
return new Response(safeBody, {
status: cr.status,
headers: { 'Content-Type': contentType, ...cr.headers },
});
}
/** HTTP wrapper — converts CommandResult to Response */
async function handleCommand(body: any, tokenInfo?: TokenInfo | null): Promise<Response> {
const cr = await handleCommandInternal(body, tokenInfo);
return buildCommandResponse(cr);
}
async function shutdown(exitCode: number = 0) {
if (isShuttingDown) return;
isShuttingDown = true;
@@ -2017,10 +2030,13 @@ export async function start() {
tokenInfo,
{ skipRateCheck: true, skipActivity: true },
);
// Sanitize lone surrogates per-result (#1440 — /batch bypasses the
// handleCommand chokepoint, so it needs its own sanitization).
const safeResult = typeof cr.result === 'string' ? sanitizeBody(cr.result, !!cr.json) : cr.result;
results.push({
index: i,
status: cr.status,
result: cr.result,
result: safeResult,
command: cmd.command,
tabId: cmd.tabId,
});
@@ -2040,13 +2056,17 @@ export async function start() {
clientId: tokenInfo?.clientId,
});
return new Response(JSON.stringify({
// Sanitize the JSON envelope a second time (defense in depth) — catches
// any \uXXXX escape sequences for lone surrogates that survived the
// per-result pass.
const batchBody = stripLoneSurrogateEscapes(JSON.stringify({
results,
duration,
total: commands.length,
succeeded: results.filter(r => r.status === 200).length,
failed: results.filter(r => r.status !== 200).length,
}), {
}));
return new Response(batchBody, {
status: 200,
headers: { 'Content-Type': 'application/json' },
});
+4 -3
View File
@@ -22,6 +22,7 @@ import type { TabSession, RefEntry } from './tab-session';
import * as Diff from 'diff';
import { TEMP_DIR, isPathWithin } from './platform';
import { escapeEnvelopeSentinels } from './content-security';
import { stripLoneSurrogates } from './sanitize';
// Roles considered "interactive" for the -i flag
const INTERACTIVE_ROLES = new Set([
@@ -576,7 +577,7 @@ export async function handleSnapshot(
}
session.setLastSnapshot(snapshotText);
return diffOutput.join('\n');
return stripLoneSurrogates(diffOutput.join('\n'));
}
// Store for future diffs
@@ -623,8 +624,8 @@ export async function handleSnapshot(
parts.push('═══ BEGIN UNTRUSTED WEB CONTENT ═══');
parts.push(...safeUntrusted);
parts.push('═══ END UNTRUSTED WEB CONTENT ═══');
return parts.join('\n');
return stripLoneSurrogates(parts.join('\n'));
}
return output.join('\n');
return stripLoneSurrogates(output.join('\n'));
}