docs: post-ship sync for v1.38.0.0

Document the two architectural invariants that landed in v1.38.0.0 in
their persistent homes (not just CHANGELOG):

- README Windows section: add the `./setup` re-run-after-git-pull
  requirement that `_print_windows_copy_note_once` shows at runtime.
- CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for
  contributors editing `setup`, with the test that enforces it.
- ARCHITECTURE: new "Unicode sanitization at server egress" section
  between Shell injection prevention and Prompt injection defense,
  with egress table (HTTP/batch/SSE) and the post-stringify-regex
  rationale.
- CLAUDE.md: cross-references for both invariants, matching the
  v1.6.0.0 dual-listener pattern (each constraint says which files
  to read before editing and which test pins it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-14 14:05:26 -07:00
parent 0c3bd893e5
commit 6d330f5150
4 changed files with 43 additions and 0 deletions
+15
View File
@@ -144,6 +144,21 @@ Cookies are the most sensitive data gstack handles. The design:
The browser registry (Comet, Chrome, Arc, Brave, Edge) is hardcoded. Database paths are constructed from known constants, never from user input. Keychain access uses `Bun.spawn()` with explicit argument arrays, not shell string interpolation.
### Unicode sanitization at server egress (v1.38.0.0)
Page content harvested by CDP can contain lone UTF-16 surrogate halves (orphaned high or low surrogates from broken JavaScript string handling on the page). When those reach `JSON.stringify`, Bun emits them as `\uD800`-style escape sequences that the downstream consumer's `JSON.parse` accepts, but the Anthropic API rejects with a 400 — turning a single weird page into a session-killing error. Defense is single-point, applied at every server egress that ships page-derived strings.
| Egress path | Module | Sanitization point |
|---|---|---|
| `POST /command` (HTTP) | `browse/src/server.ts` | `handleCommandInternal` wrapper (sanitizes the result of `handleCommandInternalImpl`) |
| `POST /command/batch` | `browse/src/server.ts` | Same wrapper — batch consumers inherit it |
| `GET /activity/stream` (SSE) | `browse/src/server.ts` | `sanitizeReplacer` passed to `JSON.stringify` |
| `GET /inspector/events` (SSE) | `browse/src/server.ts` | `sanitizeReplacer` passed to `JSON.stringify` |
`sanitizeReplacer` is a `JSON.stringify` replacer function that cleans every string value during encoding. Post-stringify regex doesn't work here — `JSON.stringify` has already converted `\uD800` into the literal escape sequence `"\\ud800"` before the regex could match, so the replacer must run inside the encoding pipeline. The pure-string helper `sanitizeLoneSurrogates` is used directly for `text/plain` responses.
**Architectural invariant.** Every new SSE/WebSocket writer or HTTP response that ships page-content-derived strings MUST go through one of two paths: `JSON.stringify(payload, sanitizeReplacer)` for object payloads, or `sanitizeLoneSurrogates(body)` for text bodies. New surfaces that bypass both will desync the system. Inline comments at both SSE producers in `server.ts` say so; `browse/test/server-sanitize-surrogates.test.ts` pins wiring with bug-repro + invariant tests (`handleCommandInternalImpl` rename, central sanitization line, replacer existence, SSE producers stringify with replacer).
### Prompt injection defense (sidebar agent)
The Chrome sidebar agent has tools (Bash, Read, Glob, Grep, WebFetch) and reads hostile web pages, so it's the part of gstack most exposed to prompt injection. Defense is layered, not single-point.