mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-20 08:40:11 +02:00
Merge origin/main into garrytan/upgrade-gstack-gbrain-v1
Catch up to main (1.52.0.0, plan-tune cathedral + browse memory work). Branch bumps to 1.52.1.0 — PATCH above main. Conflict resolutions: - VERSION / package.json → 1.52.1.0 (monotonic above main's 1.52.0.0) - CHANGELOG.md → reconstructed reverse-chronological: this branch's brain-aware-planning + save-results entry renumbered 1.51.1.0 → 1.52.1.0 on top, then main's 1.52.0.0 / 1.51.0.0 / 1.49.0.0 entries, then shared history. No entries dropped or orphaned. - setup → kept both endgame blocks (my gbrain detection + main's plan-tune cathedral hook install); they're independent. - SKILL.md files → regenerated from merged templates via bun run gen:skill-docs (canonical no-gbrain), not accepted from either merge side, per CLAUDE.md. Idempotent (0 STALE on re-run). - bin/gstack-config → both sides' additions present (main's GSTACK_STATE_ROOT support + this branch's gbrain-refresh subcommand). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,140 @@
|
||||
# TODOS
|
||||
|
||||
## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR)
|
||||
|
||||
These four items came out of the memory-leak investigation that shipped
|
||||
the `$B memory` diagnostic + the four leak fixes. They were
|
||||
deliberately deferred from that PR (already 14 commits / ~12 files);
|
||||
each stands alone and any one could ship independently.
|
||||
|
||||
### P2: MV3 extension service worker memory profile
|
||||
|
||||
**What:** The `/memory` endpoint snapshot enumerates pages but does
|
||||
not enumerate the gstack baked-in extension's service-worker target.
|
||||
A long-running MV3 service worker can leak through retained DOM
|
||||
snapshots, message ports that never close, alarms that re-arm, and
|
||||
caches that grow without bound. The diagnostic should call
|
||||
`Target.getTargets` with a filter for `service_worker` and include
|
||||
each one in `tabs[]` (or a sibling `serviceWorkers[]` array) with the
|
||||
same `Performance.getMetrics` data.
|
||||
|
||||
**Why:** Codex's outside-voice review on the eng-review surfaced this
|
||||
class of leak (the extension is part of the gbrowser process tree but
|
||||
invisible to today's snapshot). Until we surface it, a SW leak shows
|
||||
up only in the parent process RSS with no per-target attribution.
|
||||
|
||||
**Pros:** Closes the per-target attribution gap for the
|
||||
single-most-likely future leak source (our own extension).
|
||||
**Cons:** Extension SW lifecycle is asymmetric vs page lifecycle;
|
||||
auto-attach + filter is one more piece of CDP plumbing.
|
||||
|
||||
**Context:** Codex finding #4 on the eng-review outside voice. Not
|
||||
in scope of the v1.49 PR; deliberately deferred to keep the PR to
|
||||
the four highest-confidence leak fixes.
|
||||
|
||||
**Priority:** P2. **Effort:** M.
|
||||
|
||||
---
|
||||
|
||||
### P2: Native + GPU memory breakdown in `$B memory`
|
||||
|
||||
**What:** `$B memory` shows Bun RSS + per-tab JS heap + Chromium
|
||||
process tree (PIDs + types + CPU time) but the per-process RSS is
|
||||
absent — `SystemInfo.getProcessInfo` doesn't expose RSS and the eng
|
||||
review (D2 USE_CDP) explicitly chose CDP over shelling to `ps`. The
|
||||
honest next step is to surface what CDP DOES give for the other
|
||||
memory categories: `Memory.getDOMCounters` per target (node + listener
|
||||
counts), `SystemInfo.getInfo` for GPU memory, `Memory.getAllTimeSamplingProfile`
|
||||
for a sampled native estimate.
|
||||
|
||||
**Why:** Codex's outside-voice review flagged that
|
||||
`Performance.getMetrics` misses native memory, GPU memory, video
|
||||
buffers, Skia, network cache, extension process RSS, and
|
||||
browser-process RSS — all the categories where a 160 GB leak would
|
||||
actually live. A diagnostic that misses the categories where the
|
||||
leak class lives undersells itself.
|
||||
|
||||
**Pros:** Per-process category breakdown closes the gap between
|
||||
"Activity Monitor says 160 GB" and what the diagnostic shows.
|
||||
**Cons:** Each CDP method has its own quirks; this is a real
|
||||
implementation pass, not a one-line addition.
|
||||
|
||||
**Context:** Codex finding #5 on the eng-review outside voice. Not
|
||||
in scope of the v1.49 PR; deliberately deferred.
|
||||
|
||||
**Priority:** P2. **Effort:** M.
|
||||
|
||||
---
|
||||
|
||||
### P3: Single-context CDP listener for Network.loadingFinished
|
||||
|
||||
**What:** `wirePageEvents` attaches a `page.on('requestfinished')`
|
||||
listener PER PAGE. The D10 fix removed the body-materialization leak
|
||||
inside that listener but kept the per-page listener architecture
|
||||
(7 listeners attached per tab — close, framenavigated, dialog,
|
||||
console, request, response, requestfinished). The stretch goal from
|
||||
D10 was to replace the per-page `requestfinished` listener with a
|
||||
single context-level CDP listener via
|
||||
`Target.setAutoAttach({autoAttach: true, waitForDebuggerOnStart: false,
|
||||
flatten: true})` and a browser-wide `Network.loadingFinished` event
|
||||
handler.
|
||||
|
||||
**Why:** Going from N to 1 listener for the request-size capture is
|
||||
structurally the right architecture and removes one piece of per-tab
|
||||
memory pressure. The body-materialization fix already addressed the
|
||||
acute leak; this is the architectural cleanup that prevents similar
|
||||
leaks in the same class.
|
||||
|
||||
**Pros:** One listener per browser instead of one per tab.
|
||||
**Cons:** `Target.setAutoAttach` plumbing is more code than the
|
||||
straight per-page listener; the marginal memory win is small on top
|
||||
of the body-fetch fix that already landed.
|
||||
|
||||
**Context:** D10 stretch goal on the eng-review. The minimal-risk
|
||||
fix shipped in v1.49 (replaces `await res.body()` with
|
||||
`await req.sizes()`, preserving the per-page listener); this is the
|
||||
architectural follow-up.
|
||||
|
||||
**Priority:** P3. **Effort:** M-L.
|
||||
|
||||
---
|
||||
|
||||
### P3: Real-Chromium peak-RSS reproducer (periodic tier)
|
||||
|
||||
**What:** The gate-tier reproducer
|
||||
(`browse/test/memory-leak-reproducer.test.ts`) pins the invariant
|
||||
that `res.body()` is never called during a burst of
|
||||
`requestfinished` events. It uses a fake page; it does NOT spin up a
|
||||
real Chromium nor measure peak Bun RSS during a real concurrent fetch
|
||||
burst. A periodic-tier follow-up should: spin up a real headless
|
||||
Chromium, navigate to a fixture page that concurrently fetches 500
|
||||
mixed responses (small JSON, 100 KB images, 10 MB chunked,
|
||||
gzip-compressed 2 MB), sample `process.memoryUsage().heapUsed` every
|
||||
100 ms during the burst, assert `peak_heap < 200 MB above baseline`
|
||||
AND `post-gc_heap < 30 MB above baseline`. Also include a single-tab
|
||||
WebGL canvas variant that grows to >4 GB and asserts the per-tab RSS
|
||||
toast fires.
|
||||
|
||||
**Why:** Codex flagged that the leak's real failure mode is transient
|
||||
amplification under concurrent burst, not retained leak — a steady-state
|
||||
heap test misses it. The fake-page gate-tier test catches the
|
||||
listener-architecture regression; the periodic real-browser test
|
||||
catches the actual peak-RSS class.
|
||||
|
||||
**Pros:** Closes the "did we actually demonstrate the OOM is fixed"
|
||||
question with hard numbers. Feeds the ANGLE_B_NUMBERS CHANGELOG
|
||||
release-summary table.
|
||||
**Cons:** Periodic tier costs minutes of CI time and money per run;
|
||||
real-browser memory tests are inherently flaky.
|
||||
|
||||
**Context:** Codex outside-voice finding on the eng-review; D7
|
||||
ANGLE_B_NUMBERS CHANGELOG framing needs this reproducer's numbers
|
||||
before /ship time.
|
||||
|
||||
**Priority:** P3. **Effort:** M.
|
||||
|
||||
---
|
||||
|
||||
## design daemon: follow-ups (filed v1.45.0.0 via /ship review army)
|
||||
|
||||
### ✅ DONE (v1.45.0.0): Tighten daemon test coverage
|
||||
@@ -582,7 +717,24 @@ reads it yet.
|
||||
|
||||
**Effort:** L (human: ~1 week / CC: ~4h)
|
||||
**Priority:** P0
|
||||
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
|
||||
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
|
||||
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
|
||||
Distinct from the lighter-weight diversity-display gate
|
||||
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
|
||||
AND days_span >= 7`) used in /plan-tune to render the inferred column —
|
||||
display is a UI affordance, promotion to E1 needs a much higher bar
|
||||
because behavioral adaptation is consequential and hard to revert. Prior
|
||||
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
|
||||
|
||||
**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
|
||||
skill prose is agent-compliance-based. Tests can verify templates contain the
|
||||
right reads of `~/.gstack/developer-profile.json` and the right decision
|
||||
points, but tests cannot prove agents obey them at runtime. E1 ships
|
||||
adaptations as **advisory annotations on AskUserQuestion recommendations**
|
||||
("Recommended via your profile: <choice>") until there's a hard runtime
|
||||
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
|
||||
of E1; explicit per-question preferences remain the only AUTO_DECIDE
|
||||
source.
|
||||
|
||||
### E3 — `/plan-tune narrative` + `/plan-tune vibe`
|
||||
|
||||
|
||||
Reference in New Issue
Block a user