docs: post-ship sync for v1.38.0.0

Document the two architectural invariants that landed in v1.38.0.0 in
their persistent homes (not just CHANGELOG):

- README Windows section: add the `./setup` re-run-after-git-pull
  requirement that `_print_windows_copy_note_once` shows at runtime.
- CONTRIBUTING "Things to know": add the no-raw-`ln` invariant for
  contributors editing `setup`, with the test that enforces it.
- ARCHITECTURE: new "Unicode sanitization at server egress" section
  between Shell injection prevention and Prompt injection defense,
  with egress table (HTTP/batch/SSE) and the post-stringify-regex
  rationale.
- CLAUDE.md: cross-references for both invariants, matching the
  v1.6.0.0 dual-listener pattern (each constraint says which files
  to read before editing and which test pins it).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-14 14:05:26 -07:00
parent 0c3bd893e5
commit 6d330f5150
4 changed files with 43 additions and 0 deletions
+15
View File
@@ -144,6 +144,21 @@ Cookies are the most sensitive data gstack handles. The design:
The browser registry (Comet, Chrome, Arc, Brave, Edge) is hardcoded. Database paths are constructed from known constants, never from user input. Keychain access uses `Bun.spawn()` with explicit argument arrays, not shell string interpolation.
### Unicode sanitization at server egress (v1.38.0.0)
Page content harvested by CDP can contain lone UTF-16 surrogate halves (orphaned high or low surrogates from broken JavaScript string handling on the page). When those reach `JSON.stringify`, Bun emits them as `\uD800`-style escape sequences that the downstream consumer's `JSON.parse` accepts, but the Anthropic API rejects with a 400 — turning a single weird page into a session-killing error. Defense is single-point, applied at every server egress that ships page-derived strings.
| Egress path | Module | Sanitization point |
|---|---|---|
| `POST /command` (HTTP) | `browse/src/server.ts` | `handleCommandInternal` wrapper (sanitizes the result of `handleCommandInternalImpl`) |
| `POST /command/batch` | `browse/src/server.ts` | Same wrapper — batch consumers inherit it |
| `GET /activity/stream` (SSE) | `browse/src/server.ts` | `sanitizeReplacer` passed to `JSON.stringify` |
| `GET /inspector/events` (SSE) | `browse/src/server.ts` | `sanitizeReplacer` passed to `JSON.stringify` |
`sanitizeReplacer` is a `JSON.stringify` replacer function that cleans every string value during encoding. Post-stringify regex doesn't work here — `JSON.stringify` has already converted `\uD800` into the literal escape sequence `"\\ud800"` before the regex could match, so the replacer must run inside the encoding pipeline. The pure-string helper `sanitizeLoneSurrogates` is used directly for `text/plain` responses.
**Architectural invariant.** Every new SSE/WebSocket writer or HTTP response that ships page-content-derived strings MUST go through one of two paths: `JSON.stringify(payload, sanitizeReplacer)` for object payloads, or `sanitizeLoneSurrogates(body)` for text bodies. New surfaces that bypass both will desync the system. Inline comments at both SSE producers in `server.ts` say so; `browse/test/server-sanitize-surrogates.test.ts` pins wiring with bug-repro + invariant tests (`handleCommandInternalImpl` rename, central sanitization line, replacer existence, SSE producers stringify with replacer).
### Prompt injection defense (sidebar agent)
The Chrome sidebar agent has tools (Bash, Read, Glob, Grep, WebFetch) and reads hostile web pages, so it's the part of gstack most exposed to prompt injection. Defense is layered, not single-point.
+25
View File
@@ -269,6 +269,31 @@ to `~/.gstack/security/attempts.jsonl` via `tunnel-denial-log.ts`. Before editin
the module boundary (no imports from `token-registry.ts` into `sse-session-cookie.ts`)
is load-bearing for scope isolation.
**Unicode sanitization at server egress** (v1.38.0.0+). Every server egress that
ships page-content-derived strings MUST go through `JSON.stringify(payload,
sanitizeReplacer)` for object payloads or `sanitizeLoneSurrogates(body)` for text
bodies. Lone UTF-16 surrogate halves from CDP page content otherwise reach the
Anthropic API as `\uD800`-style escapes and trigger a 400. Wired at four egress
points today: `handleCommandInternal` (HTTP + batch via a sanitizing wrapper around
`handleCommandInternalImpl`) and both SSE producers (`/activity/stream`,
`/inspector/events`). Post-stringify regex is a no-op — `JSON.stringify` has
already escaped the surrogate before regex could match, so the replacer must run
inside the encoding pipeline. Before adding a new SSE/WebSocket writer or HTTP
response in `server.ts`, read
[ARCHITECTURE.md](ARCHITECTURE.md#unicode-sanitization-at-server-egress-v13800).
`browse/test/server-sanitize-surrogates.test.ts` pins the wiring with invariant
tests, so bypasses fail CI.
**Setup symlink hardening** (v1.38.0.0+). Every link site in `setup` MUST route
through the `_link_or_copy SRC DST` helper near the `IS_WINDOWS` detection. On
Windows without Developer Mode, plain `ln -snf` produces frozen file copies that
don't refresh on `git pull` — silent staleness across every host adapter. The
helper preserves `ln -snf` on Unix and switches to `cp -R` / `cp -f` on Windows.
`test/setup-windows-fallback.test.ts` enforces a static invariant: a single raw
`ln` call outside the helper body fails CI. Windows users get a one-line note
from `_print_windows_copy_note_once` reminding them to re-run `./setup` after
every `git pull`.
**Sidebar security stack** (layered defense against prompt injection):
| Layer | Module | Lives in |
+1
View File
@@ -342,6 +342,7 @@ When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It d
- **Conductor workspaces are independent.** Each workspace is its own git worktree. `bin/dev-setup` runs automatically via `conductor.json`.
- **`.env` propagates across worktrees.** Set it once in the main repo, all Conductor workspaces get it.
- **`.claude/skills/` is gitignored.** The symlinks never get committed.
- **Never write raw `ln -snf` in `setup`.** Every link site in `setup` MUST route through the `_link_or_copy SRC DST` helper near the `IS_WINDOWS` detection. The helper preserves `ln -snf` on Unix and switches to `cp -R` / `cp -f` on Windows without Developer Mode, where plain `ln -snf` produces frozen file copies that don't refresh on `git pull`. `test/setup-windows-fallback.test.ts` enforces this with a static invariant — a single raw `ln` call outside the helper body fails CI.
## Testing your changes in a real project
+2
View File
@@ -459,6 +459,8 @@ Data is stored in [Supabase](https://supabase.com) (open source Firebase alterna
**Windows users:** gstack works on Windows 11 via Git Bash or WSL. Node.js is required in addition to Bun — Bun has a known bug with Playwright's pipe transport on Windows ([bun#4253](https://github.com/oven-sh/bun/issues/4253)). The browse server automatically falls back to Node.js. Make sure both `bun` and `node` are on your PATH.
On Windows without Developer Mode (MSYS2 / Git Bash), `setup` falls back to file copies instead of symlinks because `ln -snf` produces frozen copies that don't refresh on `git pull`. **Re-run `cd ~/.claude/skills/gstack && ./setup` after every `git pull`** so your skill files match the repo. `setup` prints a one-line note reminding you. Unix and WSL keep symlinks and don't need the re-run.
**Claude says it can't see the skills?** Make sure your project's `CLAUDE.md` has a gstack section. Add this:
```