Merge origin/main into garrytan/upgrade-gstack-gbrain-v1

Catch up to main (1.52.0.0, plan-tune cathedral + browse memory work).
Branch bumps to 1.52.1.0 — PATCH above main.

Conflict resolutions:
- VERSION / package.json → 1.52.1.0 (monotonic above main's 1.52.0.0)
- CHANGELOG.md → reconstructed reverse-chronological: this branch's
  brain-aware-planning + save-results entry renumbered 1.51.1.0 →
  1.52.1.0 on top, then main's 1.52.0.0 / 1.51.0.0 / 1.49.0.0 entries,
  then shared history. No entries dropped or orphaned.
- setup → kept both endgame blocks (my gbrain detection + main's
  plan-tune cathedral hook install); they're independent.
- SKILL.md files → regenerated from merged templates via
  bun run gen:skill-docs (canonical no-gbrain), not accepted from
  either merge side, per CLAUDE.md. Idempotent (0 STALE on re-run).
- bin/gstack-config → both sides' additions present (main's
  GSTACK_STATE_ROOT support + this branch's gbrain-refresh subcommand).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-05-28 21:04:26 -07:00
113 changed files with 8705 additions and 314 deletions
+1
View File
@@ -317,6 +317,7 @@ from `snapshot`, or `@c` refs from `snapshot -C`. Full table:
| `disconnect` | Close headed Chrome, return to headless |
| `focus [@ref]` | Bring headed Chrome to foreground (macOS); `@ref` also scrolls into view |
| `state save\|load <name>` | Save or load browser state (cookies + URLs) |
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. Use `--json` for programmatic consumers; text mode renders sorted top-10 tabs with "and N more" tail. |
### Handoff
+145 -1
View File
@@ -1,6 +1,6 @@
# Changelog
## [1.51.1.0] - 2026-05-27
## [1.52.1.0] - 2026-05-27
## **Brain-aware planning lands. Five planning skills read structured context from any personal gbrain before asking — same questions, smarter answers, no token tax.**
@@ -78,6 +78,150 @@ Coverage: a free resolver-level unit test pins per-skill slug + tag metadata + t
- The default `bun run gen:skill-docs` (CI canonical) ignores the detection file. Committed SKILL.md stays reproducible regardless of any developer's local gbrain state. Use `bun run gen:skill-docs:user` for user-local installs.
- Two follow-ups deferred to `TODOS.md` (P2): re-verify calibration takes when gbrain v0.42+ ships `takes_add` (the `BRAIN_CALIBRATION_WRITEBACK` flag flips); extend the brain-writeback E2E to the other 4 planning skills.
## [1.52.0.0] - 2026-05-27
## **`/plan-tune` settings actually do something now. Hooks make capture deterministic, preferences binding, and free-text answers loop back as memory.**
Before this release, plan-tune was a profile inspector with a hollow substrate. Every gstack skill told the agent "log this AskUserQuestion fire," and in weeks of dogfood, zero events ever landed. Preferences were agent-honored convention. Declared profile dimensions sat in a JSON file doing nothing. After this release: a PostToolUse hook captures every AUQ fire whether the agent remembers to log or not. A PreToolUse hook substitutes auto-decided answers when you've set `never-ask`. Free-text "Other" responses get dream-cycled through Claude into structured proposals you approve, then injected into future related questions as inline context. Codex sessions are backfilled by a structured-JSONL parser, not regex on transcript text.
The cathedral lands behind one explicit consent prompt at `./setup` (with diff preview, backup, and one-command rollback) and stays on once installed.
### The numbers that matter
Measured against the existing v1.49 substrate. Reproduce with `bun test test/plan-tune-gates.test.ts test/question-log-hook.test.ts test/question-preference-hook.test.ts test/memory-cache-injection.test.ts test/distill-free-text.test.ts test/distill-apply.test.ts test/declared-annotation.test.ts test/gstack-codex-session-import.test.ts test/skill-e2e-plan-tune-cathedral.test.ts`.
| Metric | Before (v1.49.0.0) | After (v1.52.0.0) | Δ |
|---|---|---|---|
| AUQ events captured per session | 0 (agent convention) | every fire (hook) | substrate works |
| `never-ask` preferences enforced | 0% (agent convention) | 100% (hook + deny+reason) | actually binds |
| Declared profile annotations | 0 / week | every signal_key match | profile renders |
| Dream-cycle memory persistence | 0 (no mechanism) | per-project + gbrain mirror | cross-project recall |
| Codex session backfill | none (regex idea) | structured JSONL parser | future-proof |
| Per-PR test cost added | $0 | $0 (deterministic; no claude -p) | gate-tier safe |
| Unit + E2E tests added | — | 96 tests / 8 new files | green |
| Layer | What it does | Where it lives |
|---|---|---|
| 1 — Capture | PostToolUse hook → question-log.jsonl with dedup + async derive | hosts/claude/hooks/question-log-hook.ts |
| 2 — Enforcement | PreToolUse hook → deny+reason with auto-decided option | hosts/claude/hooks/question-preference-hook.ts |
| 3 — Annotation | declared profile → kebab signal_key → plain-English phrase | scripts/declared-annotation.ts |
| 4 — Surfaces | host-aware Stats, Recent auto-decisions, Audit unmarked | plan-tune/SKILL.md.tmpl |
| 5 — Discoverability | setup hook-install prompt + post-ship nudge | setup, ship/SKILL.md.tmpl |
| 6 — Tests | 5 E2E scenarios, all gate tier, $0 cost | test/skill-e2e-plan-tune-cathedral.test.ts |
| 7 — Installation | schema-aware bin: PreToolUse + PostToolUse, backup + rollback | bin/gstack-settings-hook |
| 8 — Dream cycle | Anthropic SDK distill + gbrain put_page + memory injection | bin/gstack-distill-* + Layer 2 inject |
Highest-impact number is the third row: declared profile annotations now render inline before every AUQ that matches a signal_key. Set `declared.scope_appetite = 0.85` once during /plan-tune setup, and every "should I bundle this fix?" question shows up with "(your profile leans complete-implementation)" on the recommended option. The same loop applies to verbose-vs-terse, consult-vs-delegate, and ship-now-vs-get-the-design-right.
### What this means for solo builders
The feature compounds now. Each AskUserQuestion you answer "Other" with free text gets captured by the hook, batched into proposals by `gstack-distill-free-text` (3/day cap, ~$0.01 per run), reviewed via `/plan-tune distill`, and applied as either a `never-ask` preference, a declared-profile nudge, or a reusable memory nugget that routes to your gbrain (when configured) and reappears as context the next time a related question fires. The dream cycle is the unlock — without it, every nuanced answer evaporated after one turn. Now they accumulate. Run `./setup` and accept the hook-install prompt to turn it on, then `/plan-tune` whenever you want to see what your profile knows about you.
### Itemized changes
**Added**
- `hosts/claude/hooks/question-log-hook` — PostToolUse hook, matcher covers `AskUserQuestion` + `mcp__*__AskUserQuestion`. Captures every AUQ fire with marker-first question_id (D18), hash-fallback observed-only, source-tagged.
- `hosts/claude/hooks/question-preference-hook` — PreToolUse hook with `(recommended)`-label parser, refuse-on-ambiguous (D2 safety), project-then-global preference precedence (D8), one-way safety override. Auto-decided events logged from the hook itself since deny prevents PostToolUse from firing.
- `scripts/declared-annotation.ts``getDeclaredAnnotation(signal_key)` with kebab→underscore namespace mapping. Returns null in the middle band, plain-English phrase in strong bands (>= 0.7 or <= 0.3).
- `bin/gstack-codex-session-import` — structured JSONL parser for `~/.codex/sessions/`. Marker-first recovery with pattern fallback, source-tagged `codex-import-marker` / `codex-import-pattern`.
- `bin/gstack-distill-free-text` — Layer 8 dream cycle distiller. Anthropic SDK direct call (Haiku 4.5), 3/day rate cap per slug (D7), cumulative cost log, sync-or-background execution context (D14).
- `bin/gstack-distill-apply` — applies one approved proposal to its surface (preference / declared-nudge / memory-nugget), with optional `--gbrain-published true` flag.
- `setup` — interactive consent prompt for hook installation with diff preview, backup, one-command rollback. Marker-gated so users are asked at most once.
- `ship/SKILL.md.tmpl` Step 21 — post-success plan-tune nudge, marker-gated for at-most-once.
- `docs/spikes/claude-code-hook-mutation.md` + `docs/spikes/codex-session-format.md` — Phase 1 spike outputs that pinned protocol contracts before implementation.
- 96 new tests across 8 files: STATE_ROOT honoring, v1.49 gates, settings-hook schema-aware ops, both hooks, declared-annotation, codex import, distill bin, distill apply, memory injection, 5 cathedral E2E scenarios.
**Changed**
- `bin/gstack-settings-hook` schema-aware rewrite: PreToolUse + PostToolUse registration with `_gstack_source` tag for dedup, `add-event` / `remove-source` / `diff-event` / `rollback` / `list-sources` subcommands. Legacy `add`/`remove` SessionStart shape preserved verbatim.
- `bin/gstack-question-log` — accepts source, tool_use_id, free_text; composite dedup on (source, tool_use_id) across last 100 lines (D3); async-fires `gstack-developer-profile --derive` after every successful write (D17 — without this, sample_size stayed 0).
- Three bins (`gstack-question-log`, `gstack-question-preference`, `gstack-developer-profile`) + `gstack-config` now honor `GSTACK_STATE_ROOT` env var as highest-priority override (D16 Codex correction — without this, isolation tests silently wrote to real ~/.gstack).
- `scripts/resolvers/question-tuning.ts` preamble — added marker-embedding convention (`<gstack-qid:{id}>`) and `(recommended)` label convention. Hook enforcement gates on marker presence.
- `scripts/question-registry.ts` — added `signal_key: 'decision-autonomy'` to `land-and-deploy-merge-confirm` and `land-and-deploy-rollback` so the autonomy dimension has a real signal source.
- `scripts/psychographic-signals.ts` — added `decision-autonomy` signal map.
- `plan-tune/SKILL.md.tmpl` — new sections (Recent auto-decisions, Audit unmarked, Dream cycle review, Dream cycle distill); host-aware Stats with source breakdown + MARKED %; Step 0 routing extended with dream-cycle gate.
- `bin/gstack-uninstall` — also cleans up `plan-tune-cathedral`-tagged hooks during uninstall.
**For contributors**
- 4 cross-model tension resolutions during eng review locked in: project preferences win over global (D8), hash IDs are observed-only never preference keys (D18), AUQ matcher covers MCP variants (Codex correction), enforcement uses `permissionDecision: "deny"` + reason instead of `"allow"` + `updatedInput` until the AUQ input shape is verified against real Claude Code (T6 conservative path).
- Plan-review preamble byte budget ratcheted 39000 → 40000 in `test/gen-skill-docs.test.ts` (~700 bytes added by the marker convention).
- 9 Codex outside-voice findings folded directly without re-prompting (matcher correction, derive wiring, settings.json consent, signal_key namespace, etc.).
## [1.51.0.0] - 2026-05-27
## **Long-running browser sessions hold flat RSS on the Bun side. `$B memory` gives every future OOM receipts instead of a screenshot.** Four CDP-resource leak classes closed and pinned with tripwires; a structured diagnostic surfaces Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes in real time.
This release closes four leak classes in the browse server that compounded silently across long sidebar sessions: response-body materialization in the requestfinished listener (multi-GB/hour Buffer churn on media-heavy pages), three undetached CDP session call sites (cdp-bridge, write-commands archive, cdp-inspector), an unbounded modificationHistory array in the CSS inspector, and SSE subscriber cleanup that only fired on the abort edge — TCP-died-without-abort cases (Chromium MV3 service-worker suspend, intermediate proxy half-close) left subscribers in the Set forever holding the controller and any queued bytes. All four have invariant tests; a static-grep tripwire fails CI if a future refactor reintroduces direct `newCDPSession(...)` calls outside the helper module.
Alongside the fixes, `$B memory` and `/memory` ship the diagnostic the original 160 GB OOM investigation was missing: Bun RSS + heap breakdown, per-tab JS heap via CDP `Performance.getMetrics`, Chromium process tree via `SystemInfo.getProcessInfo` (PID + type + CPU), and the bounded buffer sizes (modificationHistory, activity subscribers, inspector subscribers, console/network/dialog buffers, capture buffer bytes). The sidebar footer polls `/memory` every 30s with adaptive backoff (drops to 5min if response time exceeds 2s), and a tab-count guardrail fires soft-warn at 50 / hard-warn at 200 with a top-5-by-RAM toast offering one-click close. Single-tab JS heap above 4 GB triggers an immediate toast, catching the WebGL/video runaway case where one tab balloons without the count ever reaching 200.
### The numbers that matter
Source: this branch's 16 commits + the post-merge audit reports. Net diff: 23 files changed, +2251 / -143 = 2394 LOC across browse server (TypeScript), gstack extension (JS/HTML/CSS), and tests.
| Capability | Before this PR | After this PR |
|---|---|---|
| `requestfinished` body handling | `await res.body()` on every response, allocates full body Buffer for one `.length` read | `req.sizes()` reads structured byte count from `Network.loadingFinished`, zero body materialization, accurate for chunked / gzip / streaming responses |
| CDP session lifecycle (3 sites) | direct `newCDPSession`, detach missing or success-path-only | `withCdpSession` (try/finally detach) + `getOrCreateCdpSession` (cached + close-detach) helpers, all 3 sites migrated, static-grep tripwire prevents regression |
| modificationHistory in CSS inspector | unbounded array, grew for every `$B css` edit across the session | bounded FIFO cap 200, evicted-count surfaced in the undo error so the user knows why their target index is gone |
| SSE subscriber cleanup | abort-edge only; TCP-died-without-abort leaked subscriber + controller + queued bytes until process exit | `createSseEndpoint` helper with cleanup on abort + enqueue-throw + heartbeat-throw, idempotent (any edge fires once) |
| Tab-count visibility | none — user could accumulate hundreds of tabs without warning | soft warn at 50 (activity entry), action toast at 200 (top 5 by RAM + Close-selected + Snooze), single-tab >4 GB triggers immediate toast |
| Diagnostic command | not available | `$B memory` (text + `--json`), `/memory` endpoint (SSE-session-cookie gated), sidebar footer with adaptive backoff |
| Net change in `server.ts` (SSE refactor) | 132 lines of inline ReadableStream wiring across two endpoints | 23 lines, both endpoints route through one helper |
| Test pins for the leak class | none specific | 6 new test files, 45 new tests; static-grep tripwire fails CI on regression |
### What this means for builders
The next time you leave a gbrowser session running for days, the Bun side holds its RSS flat instead of churning on per-response Buffer allocations. If a tab does go rogue, the sidebar footer shows you in real time — `RSS: 5.6 GB · 12 tabs`, color-coded — and a 200-tab toast surfaces the top RAM consumers with one-click close before you hit the OS OOM killer. If the next OOM still fires, `$B memory` is there to give it receipts instead of theory: Activity Monitor says 160 GB; the diagnostic tells you which process tree, which tabs, and which in-memory structures are holding it. Every code path the diagnostic measures is also bounded — modificationHistory at 200, console/network/dialog buffers at 50K via the existing CircularBuffer, SSE subscribers via the new cleanup contract — so the bookkeeping itself can't leak.
### Itemized changes
#### Added
- **`$B memory` command** in `browse/src/memory-command.ts` — text mode with sorted top-10 tabs + "and N more" tail; `--json` mode for programmatic consumers and the sidebar footer poll.
- **`/memory` HTTP endpoint** in `browse/src/server.ts` — same SSE-session-cookie auth model as `/activity/stream`. Deliberately NOT extending `/health` (which already leaks AUTH_TOKEN in headed mode per TODOS.md "Audit /health token distribution").
- **`BrowserManager.getMemorySnapshot()`** — collects Bun process memory + per-tab JS heap via `Performance.getMetrics` (lazy per tracked page, swallows target-died errors) + Chromium process tree via `Browser.newBrowserCDPSession()` + `SystemInfo.getProcessInfo`.
- **`browse/src/memory-snapshot.ts`** — shared types (`MemorySnapshot`, `MemoryTabSnapshot`, `MemoryProcess`, `MemoryStructureStats`) plus `formatBytes()` renderer (4 tiers, 2 decimals at GB).
- **`withCdpSession(page, fn)`** and **`getOrCreateCdpSession(page, cache)`** in `browse/src/cdp-bridge.ts` — lifecycle helpers for one-shot and cached CDP work. Every direct `newCDPSession` call site now routes through one of them.
- **`createSseEndpoint(req, config)`** in `browse/src/sse-helpers.ts` — owns the SSE cleanup contract (abort + enqueue-throw + heartbeat-throw, all idempotent). Built-in lone-surrogate sanitization on every JSON.stringify.
- **Sidebar footer RSS readout** in `extension/sidepanel.{html,js,css}` — polls `/memory` every 30s with 5-minute backoff if response time exceeds 2s. Color-coded thresholds: orange at 2 GB Bun RSS or 50 tabs, red at 8 GB or 200 tabs.
- **Tab guardrail UX** in `extension/sidepanel.js` — top-5-by-RAM toast at 200 tabs OR any single tab over 4 GB JS heap, with checkboxes + Close-selected (via `$B closetab`) + Snooze persisted in `chrome.storage.session`. Snooze bumps the thresholds so the toast stays hidden until the user accumulates more tabs or one tab grows another 2 GB.
- **Static-grep tripwire** (`browse/test/cdp-session-cleanup.test.ts`) — fails CI if any source file outside `cdp-bridge.ts` calls `newCDPSession(...)` directly.
- **45 new tests across 6 files** pinning the leak-fix invariants: CDP session lifecycle (8), SSE cleanup contract (6), modificationHistory cap + evicted-aware error (7), tab guardrail fires-once + re-arms (6), body-materialization reproducer (1), `$B memory` formatter + byte renderer + JSON entry (17).
- **4 follow-up entries in `TODOS.md`** (P2: MV3 SW memory profile, P2: native + GPU memory breakdown, P3: single-context CDP listener via `Target.setAutoAttach`, P3: real-Chromium peak-RSS reproducer for periodic tier).
#### Changed
- **`wirePageEvents.requestfinished` no longer materializes response bodies.** Pre-fix: `await res.body()` allocated a Bun `Buffer` of the full response on every fetch just to read `.length`. Post-fix: `req.sizes()` pulls the structured byte count from `Network.loadingFinished` without body fetch. Accurate for chunked transfer, gzip-encoded responses, and streaming media.
- **`modificationHistory` capped at 200 entries with FIFO eviction.** `undoModification` error now reports `"No modification at index N. History has 200 entries (most recent 200 only — M earlier entries evicted at the cap)."` when the requested index is out of range AND the buffer has overflowed.
- **`/activity/stream` and `/inspector/events` refactored through `createSseEndpoint`.** Both endpoints collapse from ~45 lines of inline `ReadableStream` wiring to ~8 lines of helper config; behavior preserved bit-for-bit.
- **`memory` command classified under the `Server` category** in `COMMAND_DESCRIPTIONS` so it appears in the generated SKILL.md tables alongside `status` / `restart` / `handoff`.
#### For contributors
- Plan completion audit: 12 of 17 plan items DONE, 2 CHANGED (deliberate scope decisions documented in the relevant commits — `req.sizes()` swap simpler than a single-context CDP listener; tab guardrail action toast wired through `$B closetab` instead of a `chrome.tabs.remove` bridge), 1 deferred to periodic tier (UI E2E tests).
- Coverage audit: 44% pre-diagnostic-tests → ~62% after adding the formatter coverage. Strong paths (CDP session lifecycle, body materialization, history cap, tab guardrail, SSE cleanup) all at 100% with invariant tests. Extension UI tests deferred (no extension test harness in this repo today).
- The CDP-session cleanup tripwire is the most reusable artifact here — any future addition of CDP work should route through the two helpers. Trying to call `newCDPSession` outside `cdp-bridge.ts` fails CI immediately with a pointer to the right helper.
## [1.49.0.0] - 2026-05-26
## **`/plan-tune` learns to ask for consent before logging, and runs the 5-question setup automatically when your profile is empty.**
Run `/plan-tune` the first time and you get an opt-in prompt. Accept and the 5-question wizard fills in your declared profile in about two minutes. Decline and `/plan-tune` never asks again. Contributors see a slightly different prompt explaining that local question-log data helps gstack calibrate, but the default is the same: off until you say yes.
If you already opted in via `gstack-config set question_tuning true` and skipped the wizard, the next `/plan-tune` runs just the 5-question setup so your profile actually has values.
Both flows write marker files in `~/.gstack/` so you're asked at most once per choice.
### Itemized changes
**Added**
- `/plan-tune` consent prompt with contributor-specific copy. Honored by `~/.gstack/.question-tuning-prompted` marker.
- `/plan-tune` setup gate. Catches `question_tuning: true` with empty `declared`. Honored by `~/.gstack/.declared-setup-prompted` marker.
**Changed**
- `TODOS.md` E1 dependency line aligned with the canonical 90-day gate in `docs/designs/PLAN_TUNING_V0.md`. The 7-day diversity gate is for displaying inferred values in `/plan-tune` output; the 90-day gate is for shipping behavior adaptation. Both gates documented inline in `plan-tune/SKILL.md.tmpl`.
- `TODOS.md` E1 substrate constraint: E1 adaptations land as advisory annotations on AskUserQuestion recommendations, not as runtime AUTO_DECIDE on inferred profile alone.
**For contributors**
- `plan-tune/SKILL.md` size budget override (50,123 → 52,963 bytes, ×1.06 vs v1.44.1 baseline). Reason logged to audit trail.
## [1.48.0.0] - 2026-05-26
## **Agents stop dropping AskUserQuestion options when there are 5+.** A new canonical preamble rule + runtime gate makes Conductor's 4-option cap a split-or-batch decision, not a silent trim.
+20
View File
@@ -294,6 +294,26 @@ response in `server.ts`, read
`browse/test/server-sanitize-surrogates.test.ts` pins the wiring with invariant
tests, so bypasses fail CI.
**SSE endpoint helper** (v1.51.0.0+). New SSE endpoints in `server.ts` MUST route
through `createSseEndpoint(req, config)` from `browse/src/sse-helpers.ts`. The
helper owns the cleanup contract (abort + enqueue-throw + heartbeat-throw, all
idempotent) and bakes in `sanitizeLoneSurrogates` on every JSON.stringify, so
new subscribers can't accidentally regress either invariant. Inline
`ReadableStream` wiring leaked subscribers when the TCP connection died without
firing `req.signal.abort` (Chromium MV3 service-worker suspend, intermediate
proxy half-close). `/activity/stream`, `/inspector/events`, and `/memory`
(SSE-eligible) all route through it. `browse/test/sse-helpers.test.ts` pins the
cleanup contract.
**CDP session lifecycle** (v1.51.0.0+). Direct `page.context().newCDPSession(page)`
calls outside `browse/src/cdp-bridge.ts` fail CI via the static-grep tripwire in
`browse/test/cdp-session-cleanup.test.ts`. Use `withCdpSession(page, async (s) => {...})`
for one-shot CDP work (try/finally detach) or `getOrCreateCdpSession(page, cache)`
for cached sessions tied to a page's lifetime (close-detach via `Map<page, session>`).
Three sites migrated: cdp-bridge frame events, write-commands archive capture,
cdp-inspector. The helpers prevent the per-session leak class where successful-path
detach happened but error-path detach was missed.
**Setup symlink hardening** (v1.38.0.0+). Every link site in `setup` MUST route
through the `_link_or_copy SRC DST` helper near the `IS_WINDOWS` detection. On
Windows without Developer Mode, plain `ln -snf` produces frozen file copies that
+1
View File
@@ -963,6 +963,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
| `disconnect` | Disconnect headed browser, return to headless mode |
| `focus [@ref]` | Bring headed browser window to foreground (macOS) |
| `handoff [message]` | Open visible Chrome at current page for user takeover |
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. JSON output with --json. |
| `restart` | Restart server |
| `resume` | Re-snapshot after user takeover, return control to AI |
| `state save|load <name>` | Save/load browser state (cookies + URLs) |
+153 -1
View File
@@ -1,5 +1,140 @@
# TODOS
## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR)
These four items came out of the memory-leak investigation that shipped
the `$B memory` diagnostic + the four leak fixes. They were
deliberately deferred from that PR (already 14 commits / ~12 files);
each stands alone and any one could ship independently.
### P2: MV3 extension service worker memory profile
**What:** The `/memory` endpoint snapshot enumerates pages but does
not enumerate the gstack baked-in extension's service-worker target.
A long-running MV3 service worker can leak through retained DOM
snapshots, message ports that never close, alarms that re-arm, and
caches that grow without bound. The diagnostic should call
`Target.getTargets` with a filter for `service_worker` and include
each one in `tabs[]` (or a sibling `serviceWorkers[]` array) with the
same `Performance.getMetrics` data.
**Why:** Codex's outside-voice review on the eng-review surfaced this
class of leak (the extension is part of the gbrowser process tree but
invisible to today's snapshot). Until we surface it, a SW leak shows
up only in the parent process RSS with no per-target attribution.
**Pros:** Closes the per-target attribution gap for the
single-most-likely future leak source (our own extension).
**Cons:** Extension SW lifecycle is asymmetric vs page lifecycle;
auto-attach + filter is one more piece of CDP plumbing.
**Context:** Codex finding #4 on the eng-review outside voice. Not
in scope of the v1.49 PR; deliberately deferred to keep the PR to
the four highest-confidence leak fixes.
**Priority:** P2. **Effort:** M.
---
### P2: Native + GPU memory breakdown in `$B memory`
**What:** `$B memory` shows Bun RSS + per-tab JS heap + Chromium
process tree (PIDs + types + CPU time) but the per-process RSS is
absent — `SystemInfo.getProcessInfo` doesn't expose RSS and the eng
review (D2 USE_CDP) explicitly chose CDP over shelling to `ps`. The
honest next step is to surface what CDP DOES give for the other
memory categories: `Memory.getDOMCounters` per target (node + listener
counts), `SystemInfo.getInfo` for GPU memory, `Memory.getAllTimeSamplingProfile`
for a sampled native estimate.
**Why:** Codex's outside-voice review flagged that
`Performance.getMetrics` misses native memory, GPU memory, video
buffers, Skia, network cache, extension process RSS, and
browser-process RSS — all the categories where a 160 GB leak would
actually live. A diagnostic that misses the categories where the
leak class lives undersells itself.
**Pros:** Per-process category breakdown closes the gap between
"Activity Monitor says 160 GB" and what the diagnostic shows.
**Cons:** Each CDP method has its own quirks; this is a real
implementation pass, not a one-line addition.
**Context:** Codex finding #5 on the eng-review outside voice. Not
in scope of the v1.49 PR; deliberately deferred.
**Priority:** P2. **Effort:** M.
---
### P3: Single-context CDP listener for Network.loadingFinished
**What:** `wirePageEvents` attaches a `page.on('requestfinished')`
listener PER PAGE. The D10 fix removed the body-materialization leak
inside that listener but kept the per-page listener architecture
(7 listeners attached per tab — close, framenavigated, dialog,
console, request, response, requestfinished). The stretch goal from
D10 was to replace the per-page `requestfinished` listener with a
single context-level CDP listener via
`Target.setAutoAttach({autoAttach: true, waitForDebuggerOnStart: false,
flatten: true})` and a browser-wide `Network.loadingFinished` event
handler.
**Why:** Going from N to 1 listener for the request-size capture is
structurally the right architecture and removes one piece of per-tab
memory pressure. The body-materialization fix already addressed the
acute leak; this is the architectural cleanup that prevents similar
leaks in the same class.
**Pros:** One listener per browser instead of one per tab.
**Cons:** `Target.setAutoAttach` plumbing is more code than the
straight per-page listener; the marginal memory win is small on top
of the body-fetch fix that already landed.
**Context:** D10 stretch goal on the eng-review. The minimal-risk
fix shipped in v1.49 (replaces `await res.body()` with
`await req.sizes()`, preserving the per-page listener); this is the
architectural follow-up.
**Priority:** P3. **Effort:** M-L.
---
### P3: Real-Chromium peak-RSS reproducer (periodic tier)
**What:** The gate-tier reproducer
(`browse/test/memory-leak-reproducer.test.ts`) pins the invariant
that `res.body()` is never called during a burst of
`requestfinished` events. It uses a fake page; it does NOT spin up a
real Chromium nor measure peak Bun RSS during a real concurrent fetch
burst. A periodic-tier follow-up should: spin up a real headless
Chromium, navigate to a fixture page that concurrently fetches 500
mixed responses (small JSON, 100 KB images, 10 MB chunked,
gzip-compressed 2 MB), sample `process.memoryUsage().heapUsed` every
100 ms during the burst, assert `peak_heap < 200 MB above baseline`
AND `post-gc_heap < 30 MB above baseline`. Also include a single-tab
WebGL canvas variant that grows to >4 GB and asserts the per-tab RSS
toast fires.
**Why:** Codex flagged that the leak's real failure mode is transient
amplification under concurrent burst, not retained leak — a steady-state
heap test misses it. The fake-page gate-tier test catches the
listener-architecture regression; the periodic real-browser test
catches the actual peak-RSS class.
**Pros:** Closes the "did we actually demonstrate the OOM is fixed"
question with hard numbers. Feeds the ANGLE_B_NUMBERS CHANGELOG
release-summary table.
**Cons:** Periodic tier costs minutes of CI time and money per run;
real-browser memory tests are inherently flaky.
**Context:** Codex outside-voice finding on the eng-review; D7
ANGLE_B_NUMBERS CHANGELOG framing needs this reproducer's numbers
before /ship time.
**Priority:** P3. **Effort:** M.
---
## design daemon: follow-ups (filed v1.45.0.0 via /ship review army)
### ✅ DONE (v1.45.0.0): Tighten daemon test coverage
@@ -582,7 +717,24 @@ reads it yet.
**Effort:** L (human: ~1 week / CC: ~4h)
**Priority:** P0
**Depends on:** 2+ weeks of v1 dogfood, profile diversity check passing.
**Depends on:** **90+ days of v1 dogfood stable across 3+ skills** (per
`docs/designs/PLAN_TUNING_V0.md` §"Deferred to v2" E1 acceptance criteria).
Distinct from the lighter-weight diversity-display gate
(`sample_size >= 20 AND skills_covered >= 3 AND question_ids_covered >= 8
AND days_span >= 7`) used in /plan-tune to render the inferred column —
display is a UI affordance, promotion to E1 needs a much higher bar
because behavioral adaptation is consequential and hard to revert. Prior
versions of this card cited "2+ weeks" which conflicted with V0 — V0 wins.
**Substrate risk (Codex outside-voice, Phase A review 2026-05-26):** Generated
skill prose is agent-compliance-based. Tests can verify templates contain the
right reads of `~/.gstack/developer-profile.json` and the right decision
points, but tests cannot prove agents obey them at runtime. E1 ships
adaptations as **advisory annotations on AskUserQuestion recommendations**
("Recommended via your profile: <choice>") until there's a hard runtime
execution path. Do NOT gate any AUTO_DECIDE on inferred profile alone in v1
of E1; explicit per-question preferences remain the only AUTO_DECIDE
source.
### E3 — `/plan-tune narrative` + `/plan-tune vibe`
+1 -1
View File
@@ -1 +1 @@
1.51.1.0
1.52.1.0
+5 -1
View File
@@ -654,7 +654,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"autoplan","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+223
View File
@@ -0,0 +1,223 @@
#!/usr/bin/env bash
# gstack-codex-session-import — backfill question-log.jsonl from Codex sessions.
#
# Codex has no AskUserQuestion tool (per docs/spikes/codex-session-format.md).
# gstack skills running on Codex emit Decision Briefs as plain agent_message
# text, and the user's response shows up in the next user_message. This
# importer reconstructs those question/answer pairs from the structured
# JSONL session files at ~/.codex/sessions/<date>/.
#
# Usage:
# gstack-codex-session-import # latest session under ~/.codex/sessions/
# gstack-codex-session-import <path/to.jsonl> # explicit session file
# gstack-codex-session-import --since <iso> # all sessions newer than <iso>
#
# Recovery strategy (two-tier per D5/T4 spike):
# 1. Marker-first: extract <gstack-qid:foo-bar> from agent_message → stable id.
# 2. Pattern fallback: detect D<N> header + numbered options → hash id
# (source=codex-import-pattern, never used as preference key per D18).
#
# Writes via bin/gstack-question-log so source tagging, dedup, and async
# derive all apply uniformly.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
CODEX_SESSIONS_ROOT="${CODEX_SESSIONS_ROOT:-$HOME/.codex/sessions}"
MODE="latest"
EXPLICIT_PATH=""
SINCE_ISO=""
if [ $# -gt 0 ]; then
case "$1" in
--since)
MODE="since"
SINCE_ISO="${2:-}"
;;
--help|-h)
sed -n '1,/^set -euo/p' "$0" | sed 's|^# \?||'
exit 0
;;
-*)
echo "unknown flag: $1" >&2
exit 1
;;
*)
MODE="explicit"
EXPLICIT_PATH="$1"
;;
esac
fi
# Resolve list of session files to process.
SESSION_FILES=()
case "$MODE" in
explicit)
if [ ! -f "$EXPLICIT_PATH" ]; then
echo "gstack-codex-session-import: file not found: $EXPLICIT_PATH" >&2
exit 1
fi
SESSION_FILES=("$EXPLICIT_PATH")
;;
latest)
if [ ! -d "$CODEX_SESSIONS_ROOT" ]; then
echo "NO_SESSIONS: $CODEX_SESSIONS_ROOT does not exist"
exit 0
fi
LATEST=$(find "$CODEX_SESSIONS_ROOT" -type f -name "rollout-*.jsonl" -print 2>/dev/null \
| xargs ls -t 2>/dev/null | head -1 || true)
if [ -z "$LATEST" ]; then
echo "NO_SESSIONS: no rollout-*.jsonl files under $CODEX_SESSIONS_ROOT"
exit 0
fi
SESSION_FILES=("$LATEST")
;;
since)
if [ -z "$SINCE_ISO" ]; then
echo "--since requires an ISO 8601 timestamp" >&2
exit 1
fi
while IFS= read -r f; do
SESSION_FILES+=("$f")
done < <(find "$CODEX_SESSIONS_ROOT" -type f -name "rollout-*.jsonl" -newer <(date -u -d "$SINCE_ISO" 2>/dev/null || date -u) 2>/dev/null)
;;
esac
if [ ${#SESSION_FILES[@]} -eq 0 ]; then
echo "NO_SESSIONS: nothing to import"
exit 0
fi
# Parse + extract via bun. Emits one line per question found, ready to pipe
# into gstack-question-log. Tagged with source so downstream consumers
# (/plan-tune stats, dream cycle) can distinguish backfilled events from
# live captures.
IMPORTED=0
SKIPPED_NO_ANSWER=0
for SESSION_FILE in "${SESSION_FILES[@]}"; do
COUNT_LINE=$(SESSION_FILE_PATH="$SESSION_FILE" QLOG_BIN="$SCRIPT_DIR/gstack-question-log" bun -e '
const fs = require("fs");
const path = require("path");
const { spawnSync } = require("child_process");
const crypto = require("crypto");
const sessionPath = process.env.SESSION_FILE_PATH;
const qlogBin = process.env.QLOG_BIN;
const lines = fs.readFileSync(sessionPath, "utf-8").trim().split("\n").filter(Boolean);
let meta = null;
const stream = [];
for (const ln of lines) {
try {
const e = JSON.parse(ln);
if (e.type === "session_meta") meta = e.payload;
else stream.push(e);
} catch {}
}
if (!meta) {
console.error("WARN: no session_meta in " + sessionPath);
console.log("0 0");
process.exit(0);
}
const cwd = meta.cwd || "";
const sessionId = (meta.id || path.basename(sessionPath)).slice(0, 64);
// Walk for agent_message → next user_message pairs.
const briefs = [];
for (let i = 0; i < stream.length; i++) {
const e = stream[i];
if (e.type !== "event_msg" || e.payload?.type !== "agent_message") continue;
const text = String(e.payload?.message || "");
if (!text) continue;
// Detect D-numbered brief or marker. Markers are sufficient on their own.
const markerMatch = text.match(/<gstack-qid:([a-z0-9-]{1,64})>/i);
const dMatch = text.match(/^D\d+[\.\d]*\s*[—\-]\s*(.+?)$/m);
if (!markerMatch && !dMatch) continue;
// Find the next user_message in the stream.
let answer = null;
for (let j = i + 1; j < stream.length; j++) {
const e2 = stream[j];
if (e2.type === "event_msg" && e2.payload?.type === "user_message") {
answer = String(e2.payload?.message || "").trim();
break;
}
}
if (!answer) continue;
// Extract options A) ... B) ... from the brief.
const optMatches = [...text.matchAll(/^([A-Z])\)\s+(.+?)(?:\s+\(recommended\))?$/gm)];
const options = optMatches.map((m) => m[2].trim());
// Identify recommended option (label first, prose fallback).
let recommended;
const recLabel = [...text.matchAll(/^([A-Z])\)\s+(.+?)\s+\(recommended\)$/gm)];
if (recLabel.length === 1) recommended = recLabel[0][2].trim();
// Identify which option the user picked from their answer.
// Look for "A" / "A) ..." / option-label prefix match.
let userChoice = "__unknown__";
const letterMatch = answer.match(/^\s*([A-Z])\b/);
if (letterMatch) {
const idx = letterMatch[1].charCodeAt(0) - 65;
if (idx >= 0 && idx < options.length) userChoice = options[idx];
else userChoice = letterMatch[1];
} else if (options.length > 0) {
const lower = answer.toLowerCase();
const m = options.find((o) => lower.includes(o.toLowerCase().slice(0, 12)));
if (m) userChoice = m;
}
if (userChoice === "__unknown__") {
userChoice = answer.slice(0, 64);
}
const summary = (dMatch?.[1] || text.split("\n")[0]).slice(0, 200);
let questionId, source;
if (markerMatch) {
questionId = markerMatch[1];
source = "codex-import-marker";
} else {
const sortedOpts = [...options].sort().join("|");
const h = crypto.createHash("sha1").update("codex::" + summary + "::" + sortedOpts).digest("hex").slice(0, 10);
questionId = "hook-" + h;
source = "codex-import-pattern";
}
briefs.push({
skill: "codex",
question_id: questionId,
question_summary: summary,
options_count: options.length || 1,
user_choice: userChoice.slice(0, 64),
...(recommended ? { recommended: recommended.slice(0, 64) } : {}),
source,
session_id: sessionId,
// Use ts_nanos+ts shape from the event itself if available; else null.
ts: e.timestamp || undefined,
});
}
let imported = 0;
for (const b of briefs) {
const res = spawnSync(qlogBin, [JSON.stringify(b)], {
encoding: "utf-8",
stdio: ["ignore", "pipe", "pipe"],
// Run from the originating cwd so gstack-slug bucks events into the
// right project. Falls back to the importer cwd if the session cwd
// no longer exists.
cwd: cwd && fs.existsSync(cwd) ? cwd : undefined,
timeout: 5000,
});
if (res.status === 0) imported++;
}
console.log(imported + " 0");
' 2>&1)
IMP=$(echo "$COUNT_LINE" | awk "{print \$1}")
IMPORTED=$((IMPORTED + IMP))
done
echo "IMPORTED: $IMPORTED events from ${#SESSION_FILES[@]} session(s)"
+3 -1
View File
@@ -8,11 +8,13 @@
# gstack-config defaults — show just the defaults table
#
# Env overrides (for testing):
# GSTACK_STATE_ROOT — override ~/.gstack state directory (highest priority,
# matches D16 cathedral isolation convention)
# GSTACK_HOME — override ~/.gstack state directory (aligns with writer scripts)
# GSTACK_STATE_DIR — legacy alias for GSTACK_HOME (kept for backwards compat)
set -euo pipefail
STATE_DIR="${GSTACK_HOME:-${GSTACK_STATE_DIR:-$HOME/.gstack}}"
STATE_DIR="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-${GSTACK_STATE_DIR:-$HOME/.gstack}}}"
CONFIG_FILE="$STATE_DIR/config.yaml"
# Annotated header for new config files. Written once on first `set`.
+2 -1
View File
@@ -28,7 +28,8 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
# GSTACK_STATE_ROOT takes precedence over GSTACK_HOME (test isolation per D16).
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
PROFILE_FILE="$GSTACK_HOME/developer-profile.json"
LEGACY_FILE="$GSTACK_HOME/builder-profile.jsonl"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null || true)"
+181
View File
@@ -0,0 +1,181 @@
#!/usr/bin/env bash
# gstack-distill-apply — apply a single distillation proposal after user Y.
#
# Plan-tune cathedral T11. Reads distillation-proposals.json, applies the
# Nth proposal to the right surface:
#
# preference → gstack-question-preference --write
# declared-nudge → atomic update to ~/.gstack/developer-profile.json declared
# memory-nugget → append to ~/.gstack/free-text-memory.json (local fallback)
#
# Always confirm before calling this from the skill — the bin assumes the user
# already approved (Codex #15 trust boundary). The skill template (/plan-tune
# distill review section) handles the confirm UX.
#
# gbrain integration: when gbrain is configured, the skill template ALSO
# invokes mcp__gbrain__put_page / extract_facts / add_tag in the same turn
# (those are MCP tools, not CLI-callable). Pass --gbrain-published true to
# mark the proposal as mirrored to gbrain. The local file always gets the
# write so it's the durable source-of-truth even on machines without gbrain.
#
# Usage:
# gstack-distill-apply --proposal <N> # apply Nth proposal
# gstack-distill-apply --proposal <N> --gbrain-published true
# gstack-distill-apply --list # show pending proposals
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null || true)"
SLUG="${SLUG:-unknown}"
PROJECT_DIR="$GSTACK_HOME/projects/$SLUG"
PROPOSAL_FILE="$PROJECT_DIR/distillation-proposals.json"
MEMORY_FILE="$GSTACK_HOME/free-text-memory.json"
PROFILE_FILE="$GSTACK_HOME/developer-profile.json"
ACTION="apply"
PROPOSAL_IDX=""
GBRAIN_PUBLISHED="false"
while [ $# -gt 0 ]; do
case "$1" in
--proposal) PROPOSAL_IDX="$2"; shift 2 ;;
--gbrain-published) GBRAIN_PUBLISHED="$2"; shift 2 ;;
--list) ACTION="list"; shift ;;
--help|-h)
sed -n '1,/^set -euo/p' "$0" | sed 's|^# \?||'
exit 0
;;
*) echo "unknown arg: $1" >&2; exit 1 ;;
esac
done
if [ ! -f "$PROPOSAL_FILE" ]; then
echo "NO_PROPOSALS: $PROPOSAL_FILE missing — run gstack-distill-free-text first"
exit 0
fi
if [ "$ACTION" = "list" ]; then
PROPOSAL_FILE_PATH="$PROPOSAL_FILE" bun -e '
const fs = require("fs");
const p = JSON.parse(fs.readFileSync(process.env.PROPOSAL_FILE_PATH, "utf-8"));
const proposals = p.proposals || [];
if (proposals.length === 0) { console.log("(no proposals)"); process.exit(0); }
console.log("GENERATED: " + p.generated_at);
console.log("SOURCE_EVENTS: " + (p.source_event_count || 0));
proposals.forEach((pr, i) => {
console.log("");
console.log("[" + i + "] " + (pr.kind || "?") + " (confidence: " + (pr.confidence || "?") + ")");
if (pr.rationale) console.log(" rationale: " + pr.rationale);
if (pr.kind === "preference") {
console.log(" question_id: " + pr.question_id);
console.log(" preference: " + pr.preference);
} else if (pr.kind === "declared-nudge") {
console.log(" dimension: " + pr.dimension);
console.log(" direction: " + pr.direction + " (" + (pr.magnitude || "?") + ")");
} else if (pr.kind === "memory-nugget") {
console.log(" nugget: " + pr.nugget);
console.log(" signal_keys: " + JSON.stringify(pr.applies_to_signal_keys || []));
}
if (pr.source_quotes && pr.source_quotes.length) {
console.log(" quotes:");
pr.source_quotes.forEach((q) => console.log(" - \"" + q + "\""));
}
});
'
exit 0
fi
if [ -z "$PROPOSAL_IDX" ]; then
echo "--proposal <N> required" >&2
exit 1
fi
# Apply via bun. Each kind has its own surface.
mkdir -p "$PROJECT_DIR"
PROPOSAL_IDX="$PROPOSAL_IDX" \
PROPOSAL_FILE_PATH="$PROPOSAL_FILE" \
MEMORY_FILE_PATH="$MEMORY_FILE" \
PROFILE_FILE_PATH="$PROFILE_FILE" \
PREF_BIN="$SCRIPT_DIR/gstack-question-preference" \
GBRAIN_PUBLISHED="$GBRAIN_PUBLISHED" \
bun -e '
const fs = require("fs");
const { spawnSync } = require("child_process");
const idx = parseInt(process.env.PROPOSAL_IDX, 10);
const p = JSON.parse(fs.readFileSync(process.env.PROPOSAL_FILE_PATH, "utf-8"));
const proposals = p.proposals || [];
if (!Number.isInteger(idx) || idx < 0 || idx >= proposals.length) {
process.stderr.write("invalid --proposal index " + idx + " (have " + proposals.length + ")\n");
process.exit(1);
}
const pr = proposals[idx];
const stamp = new Date().toISOString();
// Memory-nugget: always write to local file (durable source-of-truth even
// when gbrain is configured — gbrain is mirror, file is canon for the
// PreToolUse hook injection path in Layer 8).
if (pr.kind === "memory-nugget") {
const memPath = process.env.MEMORY_FILE_PATH;
let mem = { nuggets: [] };
try { mem = JSON.parse(fs.readFileSync(memPath, "utf-8")); } catch {}
if (!Array.isArray(mem.nuggets)) mem.nuggets = [];
mem.nuggets.push({
nugget: pr.nugget,
applies_to_signal_keys: pr.applies_to_signal_keys || [],
applied_at: stamp,
gbrain_published: process.env.GBRAIN_PUBLISHED === "true",
source_quotes: pr.source_quotes || [],
});
const tmp = memPath + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(mem, null, 2));
fs.renameSync(tmp, memPath);
console.log("APPLIED: memory-nugget appended to " + memPath);
}
// Preference: route through gstack-question-preference for the user-origin
// gate + event audit trail. source=plan-tune is the allowed value since
// the user opt-in came from inside /plan-tune.
if (pr.kind === "preference") {
const res = spawnSync(process.env.PREF_BIN, [
"--write",
JSON.stringify({
question_id: pr.question_id,
preference: pr.preference,
source: "plan-tune",
free_text: (pr.source_quotes || []).join(" | ").slice(0, 300),
}),
], { encoding: "utf-8", stdio: ["ignore", "pipe", "pipe"], timeout: 5000 });
if (res.status !== 0) {
process.stderr.write("preference apply failed: " + (res.stderr || res.stdout) + "\n");
process.exit(1);
}
console.log("APPLIED: preference " + pr.question_id + " → " + pr.preference);
}
// Declared-nudge: atomic update to developer-profile.json declared. Magnitude
// tiers: small=0.05, medium=0.10, large=0.15. Clamp to [0, 1].
if (pr.kind === "declared-nudge") {
const mag = { small: 0.05, medium: 0.10, large: 0.15 }[pr.magnitude || "small"] || 0.05;
const delta = pr.direction === "down" ? -mag : mag;
const profilePath = process.env.PROFILE_FILE_PATH;
let profile = {};
try { profile = JSON.parse(fs.readFileSync(profilePath, "utf-8")); } catch {}
profile.declared = profile.declared || {};
const cur = typeof profile.declared[pr.dimension] === "number" ? profile.declared[pr.dimension] : 0.5;
const next = Math.max(0, Math.min(1, cur + delta));
profile.declared[pr.dimension] = +next.toFixed(3);
profile.declared_at = stamp;
const tmp = profilePath + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(profile, null, 2));
fs.renameSync(tmp, profilePath);
console.log("APPLIED: declared." + pr.dimension + " " + cur + " → " + profile.declared[pr.dimension]);
}
// Mark the proposal as applied so /plan-tune list shows it consumed.
pr.applied_at = stamp;
pr.gbrain_published = process.env.GBRAIN_PUBLISHED === "true";
const tmp = process.env.PROPOSAL_FILE_PATH + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(p, null, 2));
fs.renameSync(tmp, process.env.PROPOSAL_FILE_PATH);
'
+272
View File
@@ -0,0 +1,272 @@
#!/usr/bin/env bash
# gstack-distill-free-text — Layer 8 "dream cycle" batch distiller.
#
# Reads auq-other free-text events from this project's question-log.jsonl,
# sends them to Claude via the Anthropic SDK, and writes structured proposals
# the user can review via /plan-tune distill. Proposals require explicit
# user Y before applying — never autonomous (Codex #15 trust boundary).
#
# Usage:
# gstack-distill-free-text # sync, prompts at end
# gstack-distill-free-text --background # spawn detached; results
# # surface on next /plan-tune
# gstack-distill-free-text --dry-run # show prompt, no API call
# gstack-distill-free-text --status # show last-run stats
#
# No rate cap — the natural rate of free-text events (rare; user has to type
# "Other" then content) bounds this loop already. Each Haiku call is ~$0.01,
# so even a runaway at one-per-minute would be ~$14/day worst case. The
# cumulative cost log at $GSTACK_STATE_ROOT/distill-cost.jsonl gives full
# auditability via --status when you want it.
# Per D6: Anthropic SDK direct call, fail-loud on missing ANTHROPIC_API_KEY.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null || true)"
SLUG="${SLUG:-unknown}"
PROJECT_DIR="$GSTACK_HOME/projects/$SLUG"
LOG_FILE="$PROJECT_DIR/question-log.jsonl"
PROPOSAL_FILE="$PROJECT_DIR/distillation-proposals.json"
COST_LOG="$GSTACK_HOME/distill-cost.jsonl"
mkdir -p "$PROJECT_DIR"
MODE="sync"
case "${1:-}" in
--background) MODE="background" ;;
--dry-run) MODE="dry-run" ;;
--status) MODE="status" ;;
--help|-h)
sed -n '1,/^set -euo/p' "$0" | sed 's|^# \?||'
exit 0
;;
'') ;;
*) echo "unknown arg: $1" >&2; exit 1 ;;
esac
# --- Status subcommand --------------------------------------------------
if [ "$MODE" = "status" ]; then
COST_LOG_PATH="$COST_LOG" SLUG_PATH="$SLUG" bun -e '
const fs = require("fs");
const slug = process.env.SLUG_PATH;
const path = process.env.COST_LOG_PATH;
if (!fs.existsSync(path)) { console.log("no distill runs yet"); process.exit(0); }
const lines = fs.readFileSync(path, "utf-8").trim().split("\n").filter(Boolean);
const mine = lines.map((l) => JSON.parse(l)).filter((e) => e.slug === slug);
if (mine.length === 0) { console.log("no distill runs yet for slug=" + slug); process.exit(0); }
const totalUsd = mine.reduce((a, e) => a + (e.cost_usd_est || 0), 0);
const todayIso = new Date().toISOString().slice(0, 10);
const today = mine.filter((e) => (e.ts || "").startsWith(todayIso));
const todayUsd = today.reduce((a, e) => a + (e.cost_usd_est || 0), 0);
console.log("RUNS: " + mine.length);
console.log("TODAY: " + today.length + " run(s), $" + todayUsd.toFixed(4));
console.log("ESTIMATED_TOTAL_USD: $" + totalUsd.toFixed(4));
const last = mine[mine.length - 1];
console.log("LAST_RUN: " + (last.ts || "?") + " | " + (last.proposals_count || 0) + " proposals");
'
exit 0
fi
# --- Background mode: detach + invoke self synchronously ---------------
if [ "$MODE" = "background" ]; then
nohup "$0" >/dev/null 2>&1 &
echo "DISTILL_SPAWNED: pid=$!"
exit 0
fi
# No rate cap. Natural input rate (free-text events are rare) + Haiku price
# (~$0.01/run) keep this bounded. Use --status to audit spend.
# --- Gather unprocessed auq-other events from this project -------------
if [ ! -f "$LOG_FILE" ]; then
echo "NO_LOG: no question-log.jsonl in $PROJECT_DIR"
exit 0
fi
EVENTS_JSON=$(LOG_FILE_PATH="$LOG_FILE" bun -e '
const fs = require("fs");
const lines = fs.readFileSync(process.env.LOG_FILE_PATH, "utf-8").trim().split("\n").filter(Boolean);
const out = [];
for (const l of lines) {
try {
const e = JSON.parse(l);
if (e.source === "auq-other" && !e.distilled_at && e.free_text) {
out.push({
ts: e.ts,
question_id: e.question_id,
question_summary: e.question_summary,
free_text: e.free_text,
session_id: e.session_id,
});
}
} catch {}
}
process.stdout.write(JSON.stringify(out));
')
EVENT_COUNT=$(printf '%s' "$EVENTS_JSON" | bun -e 'const a = JSON.parse(await Bun.stdin.text()); console.log(a.length);')
if [ "$EVENT_COUNT" -eq 0 ]; then
echo "NO_FREE_TEXT: nothing to distill"
exit 0
fi
# --- Build distill prompt ---------------------------------------------
# Heredoc into temp file (avoids $(cat <<'PROMPT'...) which choked the
# bash parser on apostrophes elsewhere in the script).
DISTILL_PROMPT_FILE=$(mktemp)
trap 'rm -f "$DISTILL_PROMPT_FILE"' EXIT
cat > "$DISTILL_PROMPT_FILE" <<'PROMPT'
You are gstack dream-cycle distiller. Below are free-text responses the
user typed into AskUserQuestion prompts (option "Other") across recent gstack
sessions. For each response, extract structured signal that should update the
user plan-tune profile or preferences.
Return strict JSON with this shape:
{
"proposals": [
{
"kind": "preference" | "declared-nudge" | "memory-nugget",
"confidence": 0.0-1.0,
"source_quotes": ["<verbatim quote 1>", "<verbatim quote 2>"],
"question_id": "<id>",
"preference": "never-ask" | "always-ask" | "ask-only-for-one-way",
"dimension": "scope_appetite | risk_tolerance | detail_preference | autonomy | architecture_care",
"direction": "up | down",
"magnitude": "small | medium | large",
"rationale": "<one sentence>",
"nugget": "<one-line memory>",
"applies_to_signal_keys": ["scope-appetite", "..."]
}
]
}
Rules:
- Reject any proposal where confidence < 0.7.
- Quote VERBATIM from the user free_text. Never paraphrase a source quote.
- A single user response may produce multiple proposals.
- If nothing meaningful to extract, return {"proposals": []}.
- No commentary outside the JSON.
PROMPT
DISTILL_PROMPT=$(cat "$DISTILL_PROMPT_FILE")
# --- Dry-run: emit prompt + events, exit ------------------------------
if [ "$MODE" = "dry-run" ]; then
echo "=== DISTILL PROMPT ==="
echo "$DISTILL_PROMPT"
echo
echo "=== EVENTS ($EVENT_COUNT) ==="
echo "$EVENTS_JSON" | bun -e 'console.log(JSON.stringify(JSON.parse(await Bun.stdin.text()), null, 2));'
exit 0
fi
# --- SDK call: fail-loud on missing key -------------------------------
if [ -z "${ANTHROPIC_API_KEY:-}" ]; then
cat <<EOF >&2
gstack-distill-free-text: ANTHROPIC_API_KEY not set.
Dream-cycle distillation needs an API key for the SDK call. Set
ANTHROPIC_API_KEY in your environment, or run with --dry-run to see
what would be sent without actually calling.
Note: this is a separate billing/auth surface from your interactive
Claude Code session (per Codex correction in D6).
EOF
exit 1
fi
# Run the SDK call in bun. Emits JSON: {proposals_count, cost_usd_est}.
RESULT=$(EVENTS_JSON="$EVENTS_JSON" DISTILL_PROMPT="$DISTILL_PROMPT" \
PROPOSAL_FILE_PATH="$PROPOSAL_FILE" LOG_FILE_PATH="$LOG_FILE" \
ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" \
bun --cwd "$ROOT_DIR" -e '
const fs = require("fs");
const Anthropic = require("@anthropic-ai/sdk").default;
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const events = JSON.parse(process.env.EVENTS_JSON);
const prompt = process.env.DISTILL_PROMPT + "\n\nFREE-TEXT RESPONSES (JSON array):\n" + JSON.stringify(events, null, 2);
// Pricing (Haiku 4.5 — cheap, fast, sufficient for structured extraction).
// Per token, USD: input $0.001/1k = 1e-6, output $0.005/1k = 5e-6.
const INPUT_PER_TOKEN = 1e-6;
const OUTPUT_PER_TOKEN = 5e-6;
const resp = await client.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 4096,
messages: [{ role: "user", content: prompt }],
});
const text = resp.content.map((b) => (b.type === "text" ? b.text : "")).join("");
// Strip optional fenced code blocks the model may wrap JSON in.
const stripped = text.replace(/^```(?:json)?\s*/i, "").replace(/```\s*$/i, "").trim();
let parsed;
try { parsed = JSON.parse(stripped); } catch (e) {
process.stderr.write("DISTILL: model returned non-JSON: " + text.slice(0, 200) + "\n");
process.exit(1);
}
const proposals = Array.isArray(parsed.proposals) ? parsed.proposals : [];
// Keep only proposals with confidence >= 0.7 (model is told this rule;
// double-check in case it slipped).
const filtered = proposals.filter((p) => typeof p.confidence === "number" && p.confidence >= 0.7);
// Write proposals file (overwrite — only the latest run is reviewable).
fs.writeFileSync(process.env.PROPOSAL_FILE_PATH, JSON.stringify({
generated_at: new Date().toISOString(),
source_event_count: events.length,
proposals: filtered,
}, null, 2));
// Mark source events as distilled_at so they do not re-propose.
// Update question-log.jsonl in place: read all, rewrite with distilled_at
// set on the matching events. Match by ts + question_id.
const logPath = process.env.LOG_FILE_PATH;
const distilledAt = new Date().toISOString();
const matchKeys = new Set(events.map((e) => (e.ts || "") + "::" + (e.question_id || "")));
const lines = fs.readFileSync(logPath, "utf-8").split("\n");
const out = [];
for (const ln of lines) {
if (!ln.trim()) { out.push(ln); continue; }
try {
const e = JSON.parse(ln);
const key = (e.ts || "") + "::" + (e.question_id || "");
if (matchKeys.has(key)) {
e.distilled_at = distilledAt;
out.push(JSON.stringify(e));
} else {
out.push(ln);
}
} catch { out.push(ln); }
}
fs.writeFileSync(logPath, out.join("\n"));
// Cost estimate from usage tokens.
const usage = resp.usage || {};
const inTok = usage.input_tokens || 0;
const outTok = usage.output_tokens || 0;
const cost = inTok * INPUT_PER_TOKEN + outTok * OUTPUT_PER_TOKEN;
process.stdout.write(JSON.stringify({
proposals_count: filtered.length,
rejected_low_confidence: proposals.length - filtered.length,
input_tokens: inTok,
output_tokens: outTok,
cost_usd_est: cost,
}));
')
# Append cost log line.
TS=$(date -u +%Y-%m-%dT%H:%M:%SZ)
echo "{\"ts\":\"$TS\",\"slug\":\"$SLUG\",$(echo "$RESULT" | sed 's/^{//; s/}$//')}" >> "$COST_LOG"
echo "DISTILL_COMPLETE:"
echo " proposals_file: $PROPOSAL_FILE"
echo " $RESULT"
+82 -3
View File
@@ -28,7 +28,8 @@
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null)"
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
# GSTACK_STATE_ROOT takes precedence over GSTACK_HOME (test isolation per D16).
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
mkdir -p "$GSTACK_HOME/projects/$SLUG"
INPUT="$1"
@@ -49,12 +50,48 @@ if (!j.skill || !/^[a-z0-9-]+\$/.test(j.skill)) {
process.exit(1);
}
// Required: question_id (kebab-case, <=64 chars)
// Required: question_id (kebab-case, <=64 chars).
// Cathedral T5: hook-sourced events use 'hook-<10-char-hash>' which is
// kebab-case-compatible and passes the same regex.
if (!j.question_id || !/^[a-z0-9-]+\$/.test(j.question_id) || j.question_id.length > 64) {
process.stderr.write('gstack-question-log: invalid question_id, must be kebab-case <=64 chars\n');
process.exit(1);
}
// Optional: source — tags which writer produced this event.
// 'agent' (default) — preamble-driven write from inside the running agent
// 'hook' — PostToolUse hook captured it deterministically (T5)
// 'auq-other' — user picked 'Other' and typed free text (Layer 8)
// 'auto-decided' — PreToolUse enforcement hook substituted the answer (T6)
// 'codex-import-marker' / 'codex-import-pattern' — T9 backfill from Codex
const ALLOWED_SOURCES = ['agent', 'hook', 'auq-other', 'auto-decided', 'codex-import-marker', 'codex-import-pattern'];
if (j.source !== undefined) {
if (!ALLOWED_SOURCES.includes(j.source)) {
process.stderr.write('gstack-question-log: invalid source, must be one of: ' + ALLOWED_SOURCES.join(', ') + '\n');
process.exit(1);
}
} else {
j.source = 'agent';
}
// Optional: tool_use_id — Claude Code hook stdin field; used for dedup.
if (j.tool_use_id !== undefined) {
if (typeof j.tool_use_id !== 'string' || j.tool_use_id.length > 128) {
process.stderr.write('gstack-question-log: tool_use_id must be string <=128 chars\n');
process.exit(1);
}
}
// Optional: free_text — sanitize (no newlines, <=300 chars).
if (j.free_text !== undefined) {
if (typeof j.free_text !== 'string') {
process.stderr.write('gstack-question-log: free_text must be string\n');
process.exit(1);
}
if (j.free_text.length > 300) j.free_text = j.free_text.slice(0, 300);
j.free_text = j.free_text.replace(/\n+/g, ' ');
}
// Required: question_summary (non-empty, <=200 chars, no newlines)
if (typeof j.question_summary !== 'string' || !j.question_summary.length) {
process.stderr.write('gstack-question-log: question_summary required\n');
@@ -164,7 +201,49 @@ if [ $VALIDATE_RC -ne 0 ] || [ -z "$VALIDATED" ]; then
exit 1
fi
echo "$VALIDATED" >> "$GSTACK_HOME/projects/$SLUG/question-log.jsonl"
LOG_FILE="$GSTACK_HOME/projects/$SLUG/question-log.jsonl"
# Cathedral T5: composite-source dedup. If this exact (source, tool_use_id)
# was already logged within the last 100 lines, skip — protects against
# hook + agent both writing the same fire (D3 plan-tune cathedral decision).
# Lookup is bounded so the bin stays cheap on hot paths.
DEDUP_SKIP=""
if [ -f "$LOG_FILE" ]; then
DEDUP_SKIP=$(VALIDATED_JSON="$VALIDATED" LOG_FILE_PATH="$LOG_FILE" bun -e '
const fs = require("fs");
const j = JSON.parse(process.env.VALIDATED_JSON);
if (!j.tool_use_id) { console.log(""); process.exit(0); }
const want = j.source + ":" + j.tool_use_id;
const lines = fs.readFileSync(process.env.LOG_FILE_PATH, "utf-8").trim().split("\n").slice(-100);
for (const ln of lines) {
try {
const p = JSON.parse(ln);
if (p.source && p.tool_use_id && (p.source + ":" + p.tool_use_id) === want) {
console.log("dup");
process.exit(0);
}
} catch {}
}
console.log("");
' 2>/dev/null)
fi
if [ "$DEDUP_SKIP" = "dup" ]; then
echo "DEDUP: skipped (source=$(echo "$VALIDATED" | bun -e 'const j=JSON.parse(await Bun.stdin.text()); console.log(j.source);'), tool_use_id duplicate)"
exit 0
fi
echo "$VALIDATED" >> "$LOG_FILE"
# Cathedral T5: fire-and-forget --derive so inferred dimensions stay current
# without per-event latency (D17). Sub-second op; output suppressed; never
# blocks the hook caller. Skipped via GSTACK_QUESTION_LOG_NO_DERIVE=1 for
# tests that don't want the side effect.
if [ -z "${GSTACK_QUESTION_LOG_NO_DERIVE:-}" ]; then
(
nohup "$SCRIPT_DIR/gstack-developer-profile" --derive >/dev/null 2>&1 &
) >/dev/null 2>&1
fi
# NOTE: question-log.jsonl is deliberately NOT enqueued for gbrain-sync.
# Per Codex v2 review, audit/derivation data stays local alongside the
+2 -1
View File
@@ -23,7 +23,8 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
# GSTACK_STATE_ROOT takes precedence over GSTACK_HOME (test isolation per D16).
GSTACK_HOME="${GSTACK_STATE_ROOT:-${GSTACK_HOME:-$HOME/.gstack}}"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null || true)"
SLUG="${SLUG:-unknown}"
PREF_FILE="$GSTACK_HOME/projects/$SLUG/question-preferences.json"
+237 -34
View File
@@ -1,21 +1,44 @@
#!/usr/bin/env bash
# gstack-settings-hook — add/remove SessionStart hooks in Claude Code settings.json
# gstack-settings-hook — manage Claude Code hooks in ~/.claude/settings.json
#
# Usage:
# gstack-settings-hook add <hook-command> # add SessionStart hook
# gstack-settings-hook remove <hook-command> # remove SessionStart hook
# Two shapes:
#
# 1. Legacy (SessionStart only — used by setup --team and gstack-uninstall):
# gstack-settings-hook add <cmd> # adds SessionStart hook
# gstack-settings-hook remove <cmd> # removes matching SessionStart hook
#
# 2. Schema-aware (plan-tune cathedral T3 — supports PreToolUse + PostToolUse):
# gstack-settings-hook add-event --event <SessionStart|PreToolUse|PostToolUse> \
# --command <cmd> --source <tag> [--matcher <regex>] [--timeout <s>]
# gstack-settings-hook remove-source --source <tag>
# gstack-settings-hook diff-event --event ... --command ... --source ... [--matcher ...]
# gstack-settings-hook rollback # restore latest backup
# gstack-settings-hook list-sources # show all gstack-tagged hook entries
#
# Every add-event/remove-source writes a backup to ~/.claude/settings.json.bak.<ts>
# before mutating (Codex correction — silent settings.json mutation is wrong).
#
# Dedup: legacy `add`/`remove` dedupe by the historical `gstack-session-update`
# substring. Schema-aware `add-event` dedupes by (event, matcher, _gstack_source) so
# multiple gstack registrations (plan-tune, ...) don't collide.
#
# Requires: bun (already a gstack hard dependency)
# Writes atomically: .tmp + rename to prevent corruption on crash/disk-full.
set -euo pipefail
ACTION="${1:-}"
HOOK_CMD="${2:-}"
SETTINGS_FILE="${GSTACK_SETTINGS_FILE:-$HOME/.claude/settings.json}"
if [ -z "$ACTION" ] || [ -z "$HOOK_CMD" ]; then
echo "Usage: gstack-settings-hook {add|remove} <hook-command>" >&2
if [ -z "$ACTION" ]; then
cat <<EOF >&2
Usage:
gstack-settings-hook add <hook-command> # legacy SessionStart add
gstack-settings-hook remove <hook-command> # legacy SessionStart remove
gstack-settings-hook add-event --event <name> --command <cmd> --source <tag> [--matcher <re>] [--timeout <s>]
gstack-settings-hook remove-source --source <tag>
gstack-settings-hook diff-event --event <name> --command <cmd> --source <tag> [--matcher <re>] [--timeout <s>]
gstack-settings-hook rollback
gstack-settings-hook list-sources
EOF
exit 1
fi
@@ -24,59 +47,239 @@ if ! command -v bun >/dev/null 2>&1; then
exit 1
fi
backup_settings() {
if [ -f "$SETTINGS_FILE" ]; then
local ts
ts=$(date +%Y%m%d-%H%M%S)
cp "$SETTINGS_FILE" "$SETTINGS_FILE.bak.$ts"
echo "$SETTINGS_FILE.bak.$ts" > "$SETTINGS_FILE.bak-latest"
fi
}
# --- legacy SessionStart add/remove (backwards compat) -----------------
case "$ACTION" in
add)
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" GSTACK_HOOK_CMD="$HOOK_CMD" bun -e "
const fs = require('fs');
HOOK_CMD="${2:-}"
if [ -z "$HOOK_CMD" ]; then
echo "Usage: gstack-settings-hook add <hook-command>" >&2
exit 1
fi
backup_settings
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" GSTACK_HOOK_CMD="$HOOK_CMD" bun -e '
const fs = require("fs");
const settingsPath = process.env.GSTACK_SETTINGS_PATH;
const hookCmd = process.env.GSTACK_HOOK_CMD;
let settings = {};
try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch {}
try { settings = JSON.parse(fs.readFileSync(settingsPath, "utf8")); } catch {}
if (!settings.hooks) settings.hooks = {};
if (!settings.hooks.SessionStart) settings.hooks.SessionStart = [];
// Dedup: check if hook command already registered
const exists = settings.hooks.SessionStart.some(entry =>
entry.hooks && entry.hooks.some(h => h.command && h.command.includes('gstack-session-update'))
entry.hooks && entry.hooks.some(h => h.command && h.command.includes("gstack-session-update"))
);
if (!exists) {
settings.hooks.SessionStart.push({
hooks: [{ type: 'command', command: hookCmd }]
hooks: [{ type: "command", command: hookCmd }]
});
}
const tmp = settingsPath + '.tmp';
fs.writeFileSync(tmp, JSON.stringify(settings, null, 2) + '\n');
const tmp = settingsPath + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(settings, null, 2) + "\n");
fs.renameSync(tmp, settingsPath);
" 2>/dev/null
' 2>/dev/null
;;
remove)
HOOK_CMD="${2:-}"
if [ -z "$HOOK_CMD" ]; then
echo "Usage: gstack-settings-hook remove <hook-command>" >&2
exit 1
fi
[ -f "$SETTINGS_FILE" ] || exit 1
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e "
const fs = require('fs');
backup_settings
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e '
const fs = require("fs");
const settingsPath = process.env.GSTACK_SETTINGS_PATH;
let settings = {};
try { settings = JSON.parse(fs.readFileSync(settingsPath, 'utf8')); } catch { process.exit(0); }
try { settings = JSON.parse(fs.readFileSync(settingsPath, "utf8")); } catch { process.exit(0); }
if (settings.hooks && settings.hooks.SessionStart) {
settings.hooks.SessionStart = settings.hooks.SessionStart.filter(entry =>
!(entry.hooks && entry.hooks.some(h => h.command && h.command.includes('gstack-session-update')))
!(entry.hooks && entry.hooks.some(h => h.command && h.command.includes("gstack-session-update")))
);
if (settings.hooks.SessionStart.length === 0) delete settings.hooks.SessionStart;
if (Object.keys(settings.hooks).length === 0) delete settings.hooks;
}
const tmp = settingsPath + '.tmp';
fs.writeFileSync(tmp, JSON.stringify(settings, null, 2) + '\n');
const tmp = settingsPath + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(settings, null, 2) + "\n");
fs.renameSync(tmp, settingsPath);
" 2>/dev/null
' 2>/dev/null
;;
add-event|diff-event)
EVENT=""
COMMAND=""
SOURCE=""
MATCHER=""
TIMEOUT=""
shift
while [ $# -gt 0 ]; do
case "$1" in
--event) EVENT="$2"; shift 2 ;;
--command) COMMAND="$2"; shift 2 ;;
--source) SOURCE="$2"; shift 2 ;;
--matcher) MATCHER="$2"; shift 2 ;;
--timeout) TIMEOUT="$2"; shift 2 ;;
*) echo "unknown flag: $1" >&2; exit 1 ;;
esac
done
if [ -z "$EVENT" ] || [ -z "$COMMAND" ] || [ -z "$SOURCE" ]; then
echo "add-event/diff-event require --event, --command, --source" >&2
exit 1
fi
case "$EVENT" in
SessionStart|PreToolUse|PostToolUse|UserPromptSubmit|Stop|Notification) ;;
*) echo "invalid --event '$EVENT'; must be one of SessionStart|PreToolUse|PostToolUse|UserPromptSubmit|Stop|Notification" >&2; exit 1 ;;
esac
if [ "$ACTION" = "add-event" ]; then
backup_settings
fi
DIFF_ONLY=""
if [ "$ACTION" = "diff-event" ]; then DIFF_ONLY=1; fi
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" \
GSTACK_EVENT="$EVENT" \
GSTACK_COMMAND="$COMMAND" \
GSTACK_SOURCE="$SOURCE" \
GSTACK_MATCHER="$MATCHER" \
GSTACK_TIMEOUT="$TIMEOUT" \
GSTACK_DIFF_ONLY="$DIFF_ONLY" \
bun -e '
const fs = require("fs");
const settingsPath = process.env.GSTACK_SETTINGS_PATH;
const event = process.env.GSTACK_EVENT;
const cmd = process.env.GSTACK_COMMAND;
const source = process.env.GSTACK_SOURCE;
const matcher = process.env.GSTACK_MATCHER || "";
const timeoutRaw = process.env.GSTACK_TIMEOUT || "";
const diffOnly = process.env.GSTACK_DIFF_ONLY === "1";
let settings = {};
try { settings = JSON.parse(fs.readFileSync(settingsPath, "utf8")); } catch {}
const before = JSON.stringify(settings, null, 2);
if (!settings.hooks) settings.hooks = {};
if (!settings.hooks[event]) settings.hooks[event] = [];
const matchesEntry = (entry) => {
const sameMatcher = (entry.matcher || "") === matcher;
const sameSource = entry._gstack_source === source;
return sameMatcher && sameSource;
};
let existing = settings.hooks[event].find(matchesEntry);
const hookEntry = { type: "command", command: cmd };
if (timeoutRaw) {
const n = Number(timeoutRaw);
if (Number.isFinite(n) && n > 0) hookEntry.timeout = n;
}
if (existing) {
existing.hooks = [hookEntry];
} else {
const newEntry = { _gstack_source: source, hooks: [hookEntry] };
if (matcher) newEntry.matcher = matcher;
settings.hooks[event].push(newEntry);
}
const after = JSON.stringify(settings, null, 2);
if (diffOnly) {
console.log("--- BEFORE");
console.log(before);
console.log("--- AFTER");
console.log(after);
process.exit(0);
}
const tmp = settingsPath + ".tmp";
fs.writeFileSync(tmp, after + "\n");
fs.renameSync(tmp, settingsPath);
console.log("OK: " + event + " hook registered (source: " + source + ")");
'
;;
remove-source)
SOURCE=""
shift
while [ $# -gt 0 ]; do
case "$1" in
--source) SOURCE="$2"; shift 2 ;;
*) echo "unknown flag: $1" >&2; exit 1 ;;
esac
done
if [ -z "$SOURCE" ]; then
echo "remove-source requires --source <tag>" >&2
exit 1
fi
[ -f "$SETTINGS_FILE" ] || exit 0
backup_settings
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" GSTACK_SOURCE="$SOURCE" bun -e '
const fs = require("fs");
const settingsPath = process.env.GSTACK_SETTINGS_PATH;
const source = process.env.GSTACK_SOURCE;
let settings = {};
try { settings = JSON.parse(fs.readFileSync(settingsPath, "utf8")); } catch { process.exit(0); }
if (!settings.hooks) { process.exit(0); }
let removed = 0;
for (const event of Object.keys(settings.hooks)) {
const before = settings.hooks[event].length;
settings.hooks[event] = settings.hooks[event].filter(entry => entry._gstack_source !== source);
removed += before - settings.hooks[event].length;
if (settings.hooks[event].length === 0) delete settings.hooks[event];
}
if (Object.keys(settings.hooks).length === 0) delete settings.hooks;
const tmp = settingsPath + ".tmp";
fs.writeFileSync(tmp, JSON.stringify(settings, null, 2) + "\n");
fs.renameSync(tmp, settingsPath);
console.log("OK: removed " + removed + " hook entry/entries tagged source=" + source);
'
;;
rollback)
if [ ! -f "$SETTINGS_FILE.bak-latest" ]; then
echo "rollback: no backup pointer at $SETTINGS_FILE.bak-latest" >&2
exit 1
fi
LATEST=$(cat "$SETTINGS_FILE.bak-latest")
if [ ! -f "$LATEST" ]; then
echo "rollback: pointer references missing backup $LATEST" >&2
exit 1
fi
cp "$LATEST" "$SETTINGS_FILE"
echo "OK: restored $SETTINGS_FILE from $LATEST"
;;
list-sources)
[ -f "$SETTINGS_FILE" ] || { echo "(no settings file)"; exit 0; }
GSTACK_SETTINGS_PATH="$SETTINGS_FILE" bun -e '
const fs = require("fs");
let settings = {};
try { settings = JSON.parse(fs.readFileSync(process.env.GSTACK_SETTINGS_PATH, "utf8")); } catch { process.exit(0); }
const hooks = settings.hooks || {};
let any = false;
for (const event of Object.keys(hooks)) {
for (const entry of hooks[event]) {
if (entry._gstack_source) {
any = true;
console.log(event + "\t" + entry._gstack_source + "\t" + (entry.matcher || "(no matcher)"));
}
}
}
if (!any) console.log("(no gstack-tagged hooks)");
'
;;
*)
echo "Unknown action: $ACTION (expected add or remove)" >&2
echo "Unknown action: $ACTION" >&2
exit 1
;;
esac
+4
View File
@@ -232,6 +232,10 @@ SETTINGS_HOOK="$(dirname "$0")/gstack-settings-hook"
SESSION_UPDATE="$(dirname "$0")/gstack-session-update"
if [ -x "$SETTINGS_HOOK" ]; then
"$SETTINGS_HOOK" remove "$SESSION_UPDATE" 2>/dev/null && REMOVED+=("SessionStart hook") || true
# Cathedral T8 cleanup: also remove plan-tune PreToolUse + PostToolUse hooks.
if "$SETTINGS_HOOK" remove-source --source plan-tune-cathedral 2>/dev/null | grep -q "removed [1-9]"; then
REMOVED+=("plan-tune cathedral hooks")
fi
fi
# ─── Remove global state ────────────────────────────────────
+1
View File
@@ -921,6 +921,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
| `disconnect` | Disconnect headed browser, return to headless mode |
| `focus [@ref]` | Bring headed browser window to foreground (macOS) |
| `handoff [message]` | Open visible Chrome at current page for user takeover |
| `memory [--json]` | Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. JSON output with --json. |
| `restart` | Restart server |
| `resume` | Re-snapshot after user takeover, return control to AI |
| `state save|load <name>` | Save/load browser state (cookies + URLs) |
+188 -13
View File
@@ -18,9 +18,12 @@
import { chromium, type Browser, type BrowserContext, type BrowserContextOptions, type Page, type Locator, type Cookie } from 'playwright';
import { writeSecureFile, mkdirSecure } from './file-permissions';
import { addConsoleEntry, addNetworkEntry, addDialogEntry, networkBuffer, type DialogEntry } from './buffers';
import { emitActivity } from './activity';
import { validateNavigationUrl } from './url-validation';
import { TabSession, type RefEntry } from './tab-session';
import { resolveChromiumProfile, cleanSingletonLocks } from './config';
import { withCdpSession } from './cdp-bridge';
import type { MemorySnapshot, MemoryStructureStats, MemoryTabSnapshot, MemoryProcess } from './memory-snapshot';
/**
* Detect whether GSTACK_CHROMIUM_PATH points at a custom Chromium build that
@@ -194,6 +197,51 @@ export class BrowserManager {
private connectionMode: 'launched' | 'headed' = 'launched';
private intentionalDisconnect = false;
// ─── Tab Count Guardrail (D5 + Codex single-tab flag) ───────
// Idempotent threshold trackers: each guardrail fires exactly once per
// upward crossing of its threshold and re-arms when the tab count drops
// back below. Pre-guardrail, nothing tracked tab count growth and a
// user could accumulate hundreds of tabs (each holding 50300 MB of
// Chromium-side RSS) without warning until the OS OOM-killer fired.
// The toast UX lives in the sidebar (extension/sidepanel.js); the
// server-side responsibility is the audit-trail activity entry that
// appears in the activity feed even when the sidebar is closed.
private static readonly TAB_GUARDRAIL_SOFT = 50;
private static readonly TAB_GUARDRAIL_HARD = 200;
private tabGuardrailSoftHit = false;
private tabGuardrailHardHit = false;
/**
* Called from context.on('page') after a new tab is tracked. Emits at
* most one activity entry per upward crossing of each threshold.
*/
private checkTabGuardrails(): void {
const total = this.pages.size;
if (!this.tabGuardrailSoftHit && total >= BrowserManager.TAB_GUARDRAIL_SOFT) {
this.tabGuardrailSoftHit = true;
const msg = `Tab count crossed ${BrowserManager.TAB_GUARDRAIL_SOFT} (now ${total}). Consider closing unused tabs — each Chromium tab holds 50300 MB.`;
console.warn(`[browse] ${msg}`);
emitActivity({ type: 'error', command: 'tab-guardrail', error: msg, tabs: total });
}
if (!this.tabGuardrailHardHit && total >= BrowserManager.TAB_GUARDRAIL_HARD) {
this.tabGuardrailHardHit = true;
const msg = `Tab count crossed ${BrowserManager.TAB_GUARDRAIL_HARD} (now ${total}). OOM risk imminent. Open the sidebar to see top RAM consumers.`;
console.error(`[browse] ${msg}`);
emitActivity({ type: 'error', command: 'tab-guardrail', error: msg, tabs: total });
}
}
/** Called from page.on('close') so the guardrails re-arm. */
private recheckTabGuardrailsOnClose(): void {
const total = this.pages.size;
if (this.tabGuardrailSoftHit && total < BrowserManager.TAB_GUARDRAIL_SOFT) {
this.tabGuardrailSoftHit = false;
}
if (this.tabGuardrailHardHit && total < BrowserManager.TAB_GUARDRAIL_HARD) {
this.tabGuardrailHardHit = false;
}
}
// Called when the headed browser disconnects without intentional teardown
// (user closed the window). Wired up by server.ts to run full cleanup
// (sidebar-agent, state file, profile locks) before exiting with code 2.
@@ -620,6 +668,7 @@ export class BrowserManager {
// Inject indicator on the new tab
page.evaluate(indicatorScript).catch(() => {});
console.log(`[browse] New tab detected (id=${id}, total=${this.pages.size})`);
this.checkTabGuardrails();
});
// Persistent context opens a default page — adopt it instead of creating a new one
@@ -1004,6 +1053,116 @@ export class BrowserManager {
}
}
/**
* Diagnostic for `$B memory` and the /memory endpoint.
*
* Collects:
* - Bun process memory (cross-platform, accurate, no shelling).
* - Per-tab JS heap via CDP Performance.getMetrics the most portable
* per-tab signal CDP exposes. Misses native/GPU/Skia/cache memory
* (Codex flag on the eng-review; see follow-up TODO "native/GPU
* memory breakdown").
* - Chromium process tree via SystemInfo.getProcessInfo PID + type
* + CPU time. Per-process RSS is NOT exposed via CDP and the eng
* review (D2 USE_CDP) explicitly chose CDP over shelling to `ps`,
* so RSS columns are absent and `notes[]` says why.
*
* `structures` is passed in by the caller (read-commands / server) so
* browser-manager doesn't take a hard dep on every buffer-owning module.
*/
async getMemorySnapshot(structures: MemoryStructureStats): Promise<MemorySnapshot> {
const bunMem = process.memoryUsage();
const notes: string[] = [];
// Per-tab JS heap. Lazy: only the pages we already track. A target
// that died mid-snapshot is omitted, never throws.
const tabs: MemoryTabSnapshot[] = [];
for (const [id, page] of this.pages) {
try {
const url = (() => { try { return page.url(); } catch { return ''; } })();
const title = await page.title().catch(() => '');
const metrics = await withCdpSession(page, async (session) => {
await session.send('Performance.enable').catch(() => undefined);
const result = await session.send('Performance.getMetrics');
return ((result as { metrics?: Array<{ name: string; value: number }> }).metrics) ?? [];
});
const mm: Record<string, number> = {};
for (const m of metrics) mm[m.name] = m.value;
tabs.push({
id,
url,
title,
jsHeapUsed: mm.JSHeapUsedSize ?? 0,
jsHeapTotal: mm.JSHeapTotalSize ?? 0,
documents: mm.Documents ?? 0,
nodes: mm.Nodes ?? 0,
listeners: mm.JSEventListeners ?? 0,
});
} catch {
// Target died or CDP unavailable mid-snapshot — skip this tab.
}
}
// Chromium process tree. Browser handle may be on the `browser` field
// (launched mode) or accessible via `context.browser()` (persistent
// context / headed mode); try both.
let processes: MemoryProcess[] | null = null;
const browser: Browser | null = this.browser ?? (this.context ? this.context.browser() : null);
if (browser) {
try {
// `newBrowserCDPSession` is browser-wide. Not exposed on every
// Playwright TypeScript surface, but present at runtime on the
// Browser instance — use a typed cast to avoid the @ts-expect-error.
type BrowserWithCDP = Browser & {
newBrowserCDPSession?: () => Promise<{
send: (method: string, params?: unknown) => Promise<unknown>;
detach: () => Promise<void>;
}>;
};
const maybeFactory = (browser as BrowserWithCDP).newBrowserCDPSession;
if (typeof maybeFactory === 'function') {
const browserSession = await maybeFactory.call(browser);
try {
const info = (await browserSession.send('SystemInfo.getProcessInfo')) as {
processInfo?: Array<{ id: number; type: string; cpuTime: number }>;
};
processes = (info.processInfo ?? []).map((p) => ({
id: p.id,
type: p.type,
cpuTime: p.cpuTime,
}));
notes.push(
'Per-Chromium-process RSS not collected — SystemInfo.getProcessInfo exposes PID+type+CPU only. ' +
'See follow-up TODO "native/GPU memory breakdown" for the deferred fix.',
);
} finally {
await browserSession.detach().catch(() => undefined);
}
} else {
notes.push('Playwright build does not expose newBrowserCDPSession; per-process info skipped.');
}
} catch (err: any) {
notes.push(`CDP browser session unavailable: ${err?.message ?? String(err)}`);
}
} else {
notes.push('Browser handle unavailable (server connection mode); per-process info skipped.');
}
return {
bunServer: {
rss: bunMem.rss,
heapUsed: bunMem.heapUsed,
heapTotal: bunMem.heapTotal,
external: bunMem.external,
},
tabs,
processes,
structures,
capturedAt: Date.now(),
notes,
};
}
// ─── Ref Map (delegates to active session) ──────────────────
setRefMap(refs: Map<string, RefEntry>) {
this.getActiveSession().setRefMap(refs);
@@ -1530,6 +1689,7 @@ export class BrowserManager {
break;
}
}
this.recheckTabGuardrailsOnClose();
});
// Clear ref map on navigation — refs point to stale elements after page change
@@ -1598,23 +1758,38 @@ export class BrowserManager {
}
});
// Capture response sizes via response finished
// Capture response sizes via requestfinished — but DO NOT call
// response.body() here. Pre-fix, this listener materialized every
// response body across CDP just to read .length: multi-GB/hour of
// Buffer churn on long-lived headed Chromium with media-heavy
// pages, the primary Bun-side accelerant on the gbrowser-OOM
// investigation. req.sizes() pulls from the Network.loadingFinished
// event Chromium already emits — accurate for chunked transfer,
// gzip-compressed responses, and streaming media, all the cases
// where the previous Content-Length-header approach would have
// missed the size.
//
// The "single context-level CDP listener" architecture (D10's
// stretch goal — would reduce per-page listener count from N to 1
// via Target.setAutoAttach) is deferred. TODOS.md tracks it.
page.on('requestfinished', async (req) => {
try {
const res = await req.response();
if (res) {
const url = req.url();
const body = await res.body().catch(() => null);
const size = body ? body.length : 0;
for (let i = networkBuffer.length - 1; i >= 0; i--) {
const entry = networkBuffer.get(i);
if (entry && entry.url === url && !entry.size) {
networkBuffer.set(i, { ...entry, size });
break;
}
const sizes = await req.sizes().catch(() => null);
if (!sizes) return;
const url = req.url();
const size = sizes.responseBodySize ?? 0;
for (let i = networkBuffer.length - 1; i >= 0; i--) {
const entry = networkBuffer.get(i);
if (entry && entry.url === url && !entry.size) {
networkBuffer.set(i, { ...entry, size });
break;
}
}
} catch {}
} catch {
// Best-effort: requestfinished fires for aborted/cached requests too,
// where sizes() is unavailable. Missing size is acceptable; an
// unbounded throw would noise the console for every cache hit.
}
});
}
}
+75 -9
View File
@@ -25,18 +25,84 @@ import { logTelemetry } from './telemetry';
const CDP_TIMEOUT_MS = 5000;
const CDP_ACQUIRE_TIMEOUT_MS = 5000;
// Per-page CDPSession cache. Created lazily on first allow-listed call,
// cleaned up when the page closes.
// ─── CDP session lifecycle helpers ─────────────────────────────
//
// Every direct `newCDPSession(page)` call needs a matching `session.detach()`
// to release the Chromium-side CDP target. Forgetting the detach leaves the
// target attached until the underlying transport drops (often process exit),
// which on a long-lived headed browser shows up as steadily-climbing
// browser-process RSS. To make the leak class unforgettable, callers should
// go through one of these two helpers and a static-grep test
// (browse/test/cdp-session-cleanup.test.ts) fails CI if any source file
// calls `newCDPSession(` outside this module.
/**
* Ephemeral CDP session with try/finally detach. Use for one-shot CDP work
* where the caller doesn't need session reuse e.g. archive snapshots,
* `$B memory`, a single `Page.captureScreenshot`. The session is detached
* in `finally` regardless of whether `fn` threw, so the Chromium target
* doesn't leak on the error path.
*
* For repeated use of the same page (e.g. the `$B cdp` bridge or the
* inspector), use `getOrCreateCdpSession` instead it caches and detaches
* on page close.
*/
export async function withCdpSession<T>(
page: Page,
fn: (session: any) => Promise<T>,
): Promise<T> {
const session = await page.context().newCDPSession(page);
try {
return await fn(session);
} finally {
try {
await session.detach();
} catch {
// Best-effort cleanup. Session may already be detached (target closed,
// context recreated, browser disconnect). Swallowing all errors is the
// correct cleanup posture per CLAUDE.md "best-effort cleanup paths".
}
}
}
/**
* Cached long-lived CDP session keyed by Page. First call creates the
* session and registers a `page.once('close', ...)` hook that removes the
* cache entry AND calls `session.detach()`. Pre-helper code only removed
* the cache entry, leaving the Chromium-side target attached.
*
* Pass a caller-owned WeakMap so this helper doesn't impose a single global
* cache the `$B cdp` bridge and the inspector each keep their own session
* pool with different invariants (e.g. the inspector also detaches on
* `framenavigated` because DOM/CSS domain state is tied to the document).
*/
export async function getOrCreateCdpSession(
page: Page,
cache: WeakMap<Page, any>,
): Promise<any> {
let session = cache.get(page);
if (session) return session;
session = await page.context().newCDPSession(page);
cache.set(page, session);
page.once('close', () => {
cache.delete(page);
session.detach().catch(() => {
// Best-effort cleanup — see withCdpSession finally block.
});
});
return session;
}
// ─── $B cdp bridge ─────────────────────────────────────────────
// Per-page CDPSession cache. Lifecycle delegated to getOrCreateCdpSession
// which registers a close hook that BOTH removes the cache entry AND calls
// session.detach() — pre-helper code only did the former, leaving the
// Chromium-side target attached.
const sessionCache: WeakMap<Page, any> = new WeakMap();
async function getCdpSession(page: Page): Promise<any> {
let s = sessionCache.get(page);
if (s) return s;
s = await page.context().newCDPSession(page);
sessionCache.set(page, s);
// Clear cache on detach so we don't hold a stale handle.
page.once('close', () => sessionCache.delete(page));
return s;
return getOrCreateCdpSession(page, sessionCache);
}
export interface CdpDispatchInput {
+75 -9
View File
@@ -13,6 +13,7 @@
*/
import type { Page } from 'playwright';
import { getOrCreateCdpSession } from './cdp-bridge';
// ─── Types ──────────────────────────────────────────────────────
@@ -106,15 +107,23 @@ async function getOrCreateSession(page: Page): Promise<any> {
}
}
session = await page.context().newCDPSession(page);
cdpSessions.set(page, session);
session = await getOrCreateCdpSession(page, cdpSessions);
// Enable DOM and CSS domains
await session.send('DOM.enable');
await session.send('CSS.enable');
initializedPages.add(page);
// Enable DOM and CSS domains on first init for this page. The session
// itself is cached + close-detached by getOrCreateCdpSession; the
// initializedPages WeakSet is inspector-layer state that needs its
// own close hook to stay in sync.
if (!initializedPages.has(page)) {
await session.send('DOM.enable');
await session.send('CSS.enable');
initializedPages.add(page);
page.once('close', () => initializedPages.delete(page));
}
// Auto-detach on navigation
// Auto-detach on navigation — DOM/CSS domain state is tied to the
// document. Close-detach (from getOrCreateCdpSession) handles the
// tab-close case; framenavigated catches in-tab navigation that
// invalidates inspector state without closing the tab.
page.once('framenavigated', () => {
try {
session.detach().catch(() => {});
@@ -130,7 +139,41 @@ async function getOrCreateSession(page: Page): Promise<any> {
// ─── Modification History ───────────────────────────────────────
// Bounded FIFO of style modifications. Pre-cap, this was an unbounded
// module-scoped array that grew for every CSS edit made through $B css
// across the whole browser session — small per-entry footprint but no
// upper bound, the kind of slow leak that compounds over multi-day
// inspector use. The cap is 200 because per-session undo workflows
// rarely walk back more than a handful of edits, and a user who really
// wants to roll a long change back can `$B css reset` to revert all of
// them. totalPushed is monotonic across the session so undoModification
// can tell the user when their target index has been evicted, instead
// of just "no modification at index N".
const MOD_HISTORY_CAP = 200;
const modificationHistory: StyleModification[] = [];
let modHistoryTotalPushed = 0;
function pushModification(mod: StyleModification): void {
modificationHistory.push(mod);
modHistoryTotalPushed++;
while (modificationHistory.length > MOD_HISTORY_CAP) {
modificationHistory.shift();
}
}
// Test-only entry: exposes the history-cap mechanics (push, reset, cap value)
// without requiring a CDP-driven Page. Production code must go through
// modifyStyle / undoModification / resetModifications.
export const __testInternals = {
pushModification,
MOD_HISTORY_CAP,
getRawHistory: () => modificationHistory.slice(),
getTotalPushed: () => modHistoryTotalPushed,
resetForTest: () => {
modificationHistory.length = 0;
modHistoryTotalPushed = 0;
},
};
// ─── Specificity Calculation ────────────────────────────────────
@@ -559,7 +602,7 @@ export async function modifyStyle(
method,
};
modificationHistory.push(modification);
pushModification(modification);
return modification;
}
@@ -569,7 +612,12 @@ export async function modifyStyle(
export async function undoModification(page: Page, index?: number): Promise<void> {
const idx = index ?? modificationHistory.length - 1;
if (idx < 0 || idx >= modificationHistory.length) {
throw new Error(`No modification at index ${idx}. History has ${modificationHistory.length} entries.`);
const evictedNote = modHistoryTotalPushed > MOD_HISTORY_CAP
? ` (most recent ${MOD_HISTORY_CAP} only — ${modHistoryTotalPushed - MOD_HISTORY_CAP} earlier entries evicted at the cap)`
: '';
throw new Error(
`No modification at index ${idx}. History has ${modificationHistory.length} entries${evictedNote}.`,
);
}
const mod = modificationHistory[idx];
@@ -622,6 +670,23 @@ export function getModificationHistory(): StyleModification[] {
return [...modificationHistory];
}
/**
* Diagnostic accessor for the $B memory snapshot. Returns current buffer
* occupancy, the cap, and how many entries have been evicted since the
* last reset.
*/
export function getModificationHistoryStats(): {
current: number;
cap: number;
evicted: number;
} {
return {
current: modificationHistory.length,
cap: MOD_HISTORY_CAP,
evicted: Math.max(0, modHistoryTotalPushed - MOD_HISTORY_CAP),
};
}
/**
* Reset all modifications, restoring original values.
*/
@@ -648,6 +713,7 @@ export async function resetModifications(page: Page): Promise<void> {
}
}
modificationHistory.length = 0;
modHistoryTotalPushed = 0;
}
/**
+2
View File
@@ -45,6 +45,7 @@ export const META_COMMANDS = new Set([
'domain-skill',
'skill',
'cdp',
'memory',
]);
export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
@@ -89,6 +90,7 @@ export function wrapUntrustedContent(result: string, url: string): string {
export const COMMAND_DESCRIPTIONS: Record<string, { category: string; description: string; usage?: string }> = {
// Navigation
'memory': { category: 'Server', description: 'Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes. JSON output with --json.', usage: 'memory [--json]' },
'goto': { category: 'Navigation', description: 'Navigate to URL (http://, https://, or file:// scoped to cwd/TEMP_DIR)', usage: 'goto <url>' },
'load-html': { category: 'Navigation', description: 'Load HTML via setContent. Accepts a file path under safe-dirs (validated), OR --from-file <payload.json> with {"html":"...","waitUntil":"..."} for large inline HTML (Windows argv safe).', usage: 'load-html <file> [--wait-until load|domcontentloaded|networkidle] [--tab-id <N>] | load-html --from-file <payload.json> [--tab-id <N>]' },
'back': { category: 'Navigation', description: 'History back' },
+115
View File
@@ -0,0 +1,115 @@
// `$B memory` — diagnostic snapshot of Bun heap + per-tab JS heap +
// Chromium process tree + bounded buffer sizes. Lives in its own file
// because the meta-commands dispatcher imports it lazily — projects
// that never run the diagnostic don't pay the import-graph cost (CDP
// bridge, memory-snapshot types, buffer accessors).
import type { BrowserManager } from './browser-manager';
import { formatBytes, type MemorySnapshot, type MemoryStructureStats } from './memory-snapshot';
import { getModificationHistoryStats } from './cdp-inspector';
import { getSubscriberCount as getActivitySubscriberCount } from './activity';
import { getInspectorSubscriberCount } from './server';
import { consoleBuffer, networkBuffer, dialogBuffer } from './buffers';
import { getCaptureBuffer } from './network-capture';
/**
* Assemble the MemoryStructureStats from the modules that own each buffer.
* Browser-manager doesn't take a hard dep on every buffer-owning module
* the snapshot caller passes them in.
*/
function collectStructureStats(): MemoryStructureStats {
return {
modificationHistory: getModificationHistoryStats(),
activitySubscribers: getActivitySubscriberCount(),
inspectorSubscribers: getInspectorSubscriberCount(),
consoleBufferLen: consoleBuffer.length,
networkBufferLen: networkBuffer.length,
dialogBufferLen: dialogBuffer.length,
captureBufferBytes: getCaptureBuffer().byteSize,
};
}
/**
* Pretty-print the snapshot for terminal output. JSON mode (--json) goes
* straight through JSON.stringify so the extension footer and any test
* harness can consume it programmatically.
*/
function formatSnapshotText(s: MemorySnapshot): string {
const lines: string[] = [];
lines.push(
`Bun server: RSS: ${formatBytes(s.bunServer.rss)} ` +
`heap: ${formatBytes(s.bunServer.heapUsed)} / ${formatBytes(s.bunServer.heapTotal)} ` +
`external: ${formatBytes(s.bunServer.external)}`,
);
if (s.processes && s.processes.length > 0) {
// Group by type so the user sees "renderer: 12" vs listing 12 separate rows.
const byType: Record<string, number> = {};
for (const p of s.processes) byType[p.type] = (byType[p.type] ?? 0) + 1;
const typeSummary = Object.entries(byType)
.map(([t, n]) => `${t}=${n}`)
.join(' ');
lines.push(`Chromium processes: ${s.processes.length} total (${typeSummary})`);
} else if (s.processes === null) {
lines.push('Chromium processes: (unavailable — see notes)');
} else {
lines.push('Chromium processes: 0');
}
if (s.tabs.length > 0) {
// Sort by JS heap descending; show top 10 plus "...N more" tail.
const sorted = [...s.tabs].sort((a, b) => b.jsHeapUsed - a.jsHeapUsed);
const shown = sorted.slice(0, 10);
lines.push(`Renderers: ${s.tabs.length} tabs (top by JS heap):`);
for (const t of shown) {
const urlShort = t.url.length > 80 ? t.url.slice(0, 77) + '...' : t.url;
lines.push(
` [${formatBytes(t.jsHeapUsed).padStart(8)} JS, ` +
`${String(t.nodes).padStart(6)} nodes, ` +
`${String(t.listeners).padStart(5)} listeners] ` +
`tab #${t.id}${urlShort}`,
);
}
if (sorted.length > shown.length) {
lines.push(` ...and ${sorted.length - shown.length} more`);
}
} else {
lines.push('Renderers: (no tabs tracked)');
}
lines.push('─────────────────────────────────────────────────');
lines.push('In-memory structures (Bun side):');
const m = s.structures.modificationHistory;
lines.push(
` modificationHistory: ${m.current} / ${m.cap} entries` +
(m.evicted > 0 ? ` (${m.evicted} evicted since reset)` : ''),
);
lines.push(` inspectorSubscribers: ${s.structures.inspectorSubscribers}`);
lines.push(` activitySubscribers: ${s.structures.activitySubscribers}`);
lines.push(` consoleBuffer: ${s.structures.consoleBufferLen} entries`);
lines.push(` networkBuffer: ${s.structures.networkBufferLen} entries`);
lines.push(` dialogBuffer: ${s.structures.dialogBufferLen} entries`);
lines.push(` captureBuffer: ${formatBytes(s.structures.captureBufferBytes)}`);
if (s.notes.length > 0) {
lines.push('');
lines.push('Notes:');
for (const n of s.notes) lines.push(` - ${n}`);
}
return lines.join('\n');
}
export async function handleMemoryCommand(args: string[], bm: BrowserManager): Promise<string> {
const jsonMode = args.includes('--json');
const structures = collectStructureStats();
const snapshot = await bm.getMemorySnapshot(structures);
if (jsonMode) return JSON.stringify(snapshot);
return formatSnapshotText(snapshot);
}
/** Entry point used by the /memory HTTP endpoint — same data, always JSON. */
export async function buildMemorySnapshotJson(bm: BrowserManager): Promise<MemorySnapshot> {
const structures = collectStructureStats();
return bm.getMemorySnapshot(structures);
}
+73
View File
@@ -0,0 +1,73 @@
// Shared types for the $B memory diagnostic command and the /memory
// endpoint. Lives in its own module so server.ts, read-commands.ts, and
// the extension footer poll can import without taking a circular dep on
// browser-manager.ts.
//
// Background: the gbrowser-OOM investigation (160 GB Activity Monitor
// reading on a friend's machine) needed a diagnostic that could land
// before the next incident — measurement comes first, fixes come after.
// $B memory is that diagnostic.
/** Counts/bytes for the bounded in-memory structures on the Bun side. */
export interface MemoryStructureStats {
modificationHistory: { current: number; cap: number; evicted: number };
activitySubscribers: number;
inspectorSubscribers: number;
consoleBufferLen: number;
networkBufferLen: number;
dialogBufferLen: number;
captureBufferBytes: number;
}
/** Per-tab JS heap snapshot (CDP Performance.getMetrics). */
export interface MemoryTabSnapshot {
id: number;
url: string;
title: string;
jsHeapUsed: number;
jsHeapTotal: number;
documents: number;
nodes: number;
listeners: number;
}
/** Chromium process metadata via CDP SystemInfo.getProcessInfo. */
export interface MemoryProcess {
/** Chromium-internal process id (not OS PID). */
id: number;
/** 'browser' | 'renderer' | 'gpu' | 'utility' | 'extension' | ... */
type: string;
/** CPU time accumulated since process start (seconds). */
cpuTime: number;
}
export interface MemorySnapshot {
bunServer: {
rss: number;
heapUsed: number;
heapTotal: number;
external: number;
};
tabs: MemoryTabSnapshot[];
/**
* Chromium process tree. `null` when no browser handle is available
* (server in connection mode, or browser not yet launched).
*
* Per-process RSS is NOT included: SystemInfo.getProcessInfo returns
* id+type+cpuTime but Chromium does not expose RSS via CDP. The
* `notes[]` field tells the caller why see the follow-up TODO
* "native/GPU memory breakdown" for the deferred fix.
*/
processes: MemoryProcess[] | null;
structures: MemoryStructureStats;
capturedAt: number;
notes: string[];
}
/** Format bytes as a short human string ("1.4 GB", "312 MB", "84 KB"). */
export function formatBytes(n: number): string {
if (n < 1024) return `${n} B`;
if (n < 1024 * 1024) return `${(n / 1024).toFixed(1)} KB`;
if (n < 1024 * 1024 * 1024) return `${(n / 1024 / 1024).toFixed(1)} MB`;
return `${(n / 1024 / 1024 / 1024).toFixed(2)} GB`;
}
+7
View File
@@ -1161,6 +1161,13 @@ export async function handleMetaCommand(
return await handleCdpCommand(args, bm);
}
case 'memory': {
// Lazy import — pulls in cdp-bridge + memory-snapshot + buffer accessors
// that aren't useful for projects that never run the diagnostic.
const { handleMemoryCommand } = await import('./memory-command');
return await handleMemoryCommand(args, bm);
}
default:
throw new Error(`Unknown meta command: ${command}`);
}
+55 -108
View File
@@ -38,6 +38,7 @@ import {
import { validateTempPath } from './path-security';
import { resolveConfig, ensureStateDir, readVersionHash, resolveChromiumProfile, cleanSingletonLocks } from './config';
import { emitActivity, subscribe, getActivityAfter, getActivityHistory, getSubscriberCount } from './activity';
import { createSseEndpoint } from './sse-helpers';
import { initAuditLog, writeAuditEntry } from './audit';
import { inspectElement, modifyStyle, resetModifications, getModificationHistory, detachSession, type InspectorResult } from './cdp-inspector';
// Bun.spawn used instead of child_process.spawn (compiled bun binaries
@@ -723,6 +724,11 @@ let inspectorTimestamp: number = 0;
type InspectorSubscriber = (event: any) => void;
const inspectorSubscribers = new Set<InspectorSubscriber>();
/** Diagnostic accessor used by the $B memory snapshot. */
export function getInspectorSubscriberCount(): number {
return inspectorSubscribers.size;
}
function emitInspectorEvent(event: any): void {
for (const notify of inspectorSubscribers) {
queueMicrotask(() => {
@@ -2432,62 +2438,19 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle {
});
}
const afterId = parseInt(url.searchParams.get('after') || '0', 10);
const encoder = new TextEncoder();
const stream = new ReadableStream({
start(controller) {
// SSE egress invariant: every JSON.stringify here ships page-content-derived
// fields (URLs, command args, errors) to the sidebar. Lone surrogates must
// be sanitized DURING stringify (via sanitizeReplacer) so they're cleaned
// before escape-encoding — post-stringify regex is ineffective because
// JSON.stringify has already converted \uD800 → "\\ud800".
// 1. Gap detection + replay
// Cleanup contract (abort + enqueue-fail + heartbeat-fail, all
// idempotent) lives in createSseEndpoint; sanitizeReplacer is
// applied to every JSON.stringify inside the helper, so
// page-content-derived fields (URLs, command args, errors)
// stay surrogate-safe per CLAUDE.md egress invariant.
return createSseEndpoint(req, {
initialReplay: (send) => {
const { entries, gap, gapFrom, availableFrom } = getActivityAfter(afterId);
if (gap) {
controller.enqueue(encoder.encode(`event: gap\ndata: ${JSON.stringify({ gapFrom, availableFrom }, sanitizeReplacer)}\n\n`));
}
for (const entry of entries) {
controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry, sanitizeReplacer)}\n\n`));
}
// 2. Subscribe for live events
const unsubscribe = subscribe((entry) => {
try {
controller.enqueue(encoder.encode(`event: activity\ndata: ${JSON.stringify(entry, sanitizeReplacer)}\n\n`));
} catch (err: any) {
console.debug('[browse] Activity SSE stream error, unsubscribing:', err.message);
unsubscribe();
}
});
// 3. Heartbeat every 15s
const heartbeat = setInterval(() => {
try {
controller.enqueue(encoder.encode(`: heartbeat\n\n`));
} catch (err: any) {
console.debug('[browse] Activity SSE heartbeat failed:', err.message);
clearInterval(heartbeat);
unsubscribe();
}
}, 15000);
// 4. Cleanup on disconnect
req.signal.addEventListener('abort', () => {
clearInterval(heartbeat);
unsubscribe();
try { controller.close(); } catch {
// Expected: stream already closed
}
});
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
if (gap) send('gap', { gapFrom, availableFrom });
for (const entry of entries) send('activity', entry);
},
subscribe,
liveEventName: 'activity',
});
}
@@ -2796,6 +2759,32 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle {
});
}
// GET /memory — diagnostic snapshot (auth required, does NOT reset idle).
// Same auth model as /activity/stream and /inspector/events: Bearer header
// OR view-only SSE-session cookie. Does NOT extend /health (which already
// leaks AUTH_TOKEN to any localhost caller in headed mode — see TODOS.md
// "Audit /health token distribution"); a separate endpoint with the
// standard SSE auth keeps the future /health fix from cascading into the
// sidebar footer poll.
if (url.pathname === '/memory' && req.method === 'GET') {
const cookieToken = extractSseCookie(req);
if (!validateAuth(req) && !validateSseSessionToken(cookieToken)) {
return new Response(JSON.stringify({ error: 'Unauthorized' }), {
status: 401, headers: { 'Content-Type': 'application/json' },
});
}
const { buildMemorySnapshotJson } = await import('./memory-command');
const snapshot = await buildMemorySnapshotJson(cfgBrowserManager);
// sanitizeReplacer is required at every SSE/JSON egress that ships
// page-content-derived strings — tab.url and tab.title come from
// page content, so lone-surrogate bytes from broken emoji or
// mid-emoji splits could otherwise reach the sidebar / Claude API.
return new Response(JSON.stringify(snapshot, sanitizeReplacer), {
status: 200,
headers: { 'Content-Type': 'application/json' },
});
}
// GET /inspector/events — SSE for inspector state changes (auth required)
if (url.pathname === '/inspector/events' && req.method === 'GET') {
// Same auth model as /activity/stream: Bearer OR view-only cookie.
@@ -2806,62 +2795,20 @@ export function buildFetchHandler(cfg: ServerConfig): ServerHandle {
status: 401, headers: { 'Content-Type': 'application/json' },
});
}
const encoder = new TextEncoder();
const stream = new ReadableStream({
start(controller) {
// SSE egress invariant: inspectorData and CDP event payloads carry
// page-DOM strings (selectors, attribute values, console messages).
// sanitizeReplacer cleans lone surrogates DURING JSON.stringify so
// they're neutralized before escape-encoding (post-stringify regex
// is a no-op once \uD800 has become "\\ud800").
// Send current state immediately
if (inspectorData) {
controller.enqueue(encoder.encode(
`event: state\ndata: ${JSON.stringify({ data: inspectorData, timestamp: inspectorTimestamp }, sanitizeReplacer)}\n\n`
));
}
// Subscribe for live events
const notify: InspectorSubscriber = (event) => {
try {
controller.enqueue(encoder.encode(
`event: inspector\ndata: ${JSON.stringify(event, sanitizeReplacer)}\n\n`
));
} catch (err: any) {
console.debug('[browse] Inspector SSE stream error:', err.message);
inspectorSubscribers.delete(notify);
}
};
// Cleanup contract (abort + enqueue-fail + heartbeat-fail,
// idempotent) lives in createSseEndpoint; sanitizeReplacer is
// applied to every JSON.stringify inside the helper. The
// inspector subscriber set stays here because it's also written
// to by emitInspectorEvent above.
return createSseEndpoint(req, {
initialReplay: inspectorData
? (send) => send('state', { data: inspectorData, timestamp: inspectorTimestamp })
: undefined,
subscribe: (notify) => {
inspectorSubscribers.add(notify);
// Heartbeat every 15s
const heartbeat = setInterval(() => {
try {
controller.enqueue(encoder.encode(`: heartbeat\n\n`));
} catch (err: any) {
console.debug('[browse] Inspector SSE heartbeat failed:', err.message);
clearInterval(heartbeat);
inspectorSubscribers.delete(notify);
}
}, 15000);
// Cleanup on disconnect
req.signal.addEventListener('abort', () => {
clearInterval(heartbeat);
inspectorSubscribers.delete(notify);
try { controller.close(); } catch (err: any) {
// Expected: stream already closed
}
});
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
return () => inspectorSubscribers.delete(notify);
},
liveEventName: 'inspector',
});
}
+154
View File
@@ -0,0 +1,154 @@
// SSE endpoint helper — shared cleanup contract for stream endpoints.
//
// Pre-helper, /activity/stream and /inspector/events implemented the same
// pattern in parallel and both leaked subscribers when enqueue failed
// without a corresponding abort signal (e.g. Chromium MV3 service-worker
// suspend dropped the TCP without an abort edge). The subscriber closure
// stayed in the Set, capturing the ReadableStreamDefaultController plus
// any payloads queued behind it. Over a multi-day sidebar session this
// compounded into multi-MB of retained controllers per dead connection.
//
// Centralizing the cleanup contract here means any future SSE endpoint
// inherits the invariant — cleanup runs on abort, enqueue failure, AND
// heartbeat failure, exactly once, regardless of which edge fires first.
import { stripLoneSurrogates } from './sanitize';
/**
* JSON.stringify replacer that strips lone UTF-16 surrogates from string
* values before they get escape-encoded. Pair with stringify when the
* consumer will JSON.parse the payload back into JS strings (SSE clients
* do this). Required at every SSE egress that ships page-content-derived
* fields see CLAUDE.md "Unicode sanitization at server egress".
*/
function sanitizeReplacer(_key: string, value: unknown): unknown {
return typeof value === 'string' ? stripLoneSurrogates(value) : value;
}
/** Send an SSE event. Handles JSON encoding + lone-surrogate sanitization. */
export type SseSender = (event: string, data: unknown) => void;
export interface SseEndpointConfig<T> {
/**
* Optional. Runs once after the stream opens, before subscribing for live
* events. Use for initial event replay (activity gap detection, history
* burst) or a current-state snapshot (inspector). The `send` helper
* handles JSON encoding with sanitizeReplacer and SSE framing; pass
* any event name and any payload object.
*/
initialReplay?: (send: SseSender) => void;
/**
* Subscribe to the live event source. Receives a `notify` callback;
* returns an unsubscribe function. The callback routes through the
* helper's safeEnqueue + cleanup-on-throw, so a dead consumer ends up
* removed from the subscriber set on the very next event (instead of
* waiting for an abort that may never fire).
*/
subscribe: (notify: (entry: T) => void) => () => void;
/**
* SSE event name for live events. `data: <JSON.stringify(entry)>\n\n`
* is wrapped automatically. /activity/stream uses 'activity';
* /inspector/events uses 'inspector'.
*/
liveEventName: string;
/** Heartbeat interval in ms. Default: 15000. */
heartbeatMs?: number;
}
/**
* Build a streaming Response that owns the cleanup contract:
* - safeEnqueue catches enqueue throws cleanup
* - 15s heartbeat catches dead peers; failure cleanup
* - req.signal abort cleanup
* - cleanup is idempotent (clearInterval + unsubscribe + try close)
*/
export function createSseEndpoint<T>(
req: Request,
config: SseEndpointConfig<T>,
): Response {
const heartbeatMs = config.heartbeatMs ?? 15000;
const encoder = new TextEncoder();
const stream = new ReadableStream({
start(controller) {
let cleanedUp = false;
let heartbeat: ReturnType<typeof setInterval> | null = null;
let unsubscribe: (() => void) | null = null;
const cleanup = (): void => {
if (cleanedUp) return;
cleanedUp = true;
if (heartbeat !== null) {
clearInterval(heartbeat);
heartbeat = null;
}
if (unsubscribe !== null) {
unsubscribe();
unsubscribe = null;
}
try {
controller.close();
} catch {
// Expected: stream already closed by the consumer.
}
};
const send: SseSender = (event, data) => {
if (cleanedUp) return;
try {
controller.enqueue(
encoder.encode(
`event: ${event}\ndata: ${JSON.stringify(data, sanitizeReplacer)}\n\n`,
),
);
} catch {
// Consumer disconnected mid-write. Tear down so this subscriber
// doesn't sit in the set forever.
cleanup();
}
};
// Initial replay (caller-provided).
if (config.initialReplay) {
try {
config.initialReplay(send);
} catch {
cleanup();
return;
}
if (cleanedUp) return;
}
// Subscribe for live events.
unsubscribe = config.subscribe((entry) => {
send(config.liveEventName, entry);
});
// Heartbeat keeps NAT boxes and proxies from dropping idle SSE,
// and serves as a liveness probe: an enqueue failure here is the
// cheapest way to learn the consumer is gone without waiting for
// an abort signal that may never arrive.
heartbeat = setInterval(() => {
if (cleanedUp) return;
try {
controller.enqueue(encoder.encode(`: heartbeat\n\n`));
} catch {
cleanup();
}
}, heartbeatMs);
req.signal.addEventListener('abort', cleanup);
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
});
}
+5 -3
View File
@@ -18,6 +18,7 @@ import type { SetContentWaitUntil } from './tab-session';
import { TEMP_DIR, isPathWithin } from './platform';
import { SAFE_DIRECTORIES } from './path-security';
import { modifyStyle, undoModification, resetModifications, getModificationHistory } from './cdp-inspector';
import { withCdpSession } from './cdp-bridge';
/**
* Aggressive page cleanup selectors and heuristics.
@@ -1409,9 +1410,10 @@ export async function handleWriteCommand(
validateOutputPath(outputPath);
try {
const cdp = await page.context().newCDPSession(page);
const { data } = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
await cdp.detach();
const data = await withCdpSession(page, async (cdp) => {
const result = await cdp.send('Page.captureSnapshot', { format: 'mhtml' });
return (result as { data: string }).data;
});
fs.writeFileSync(outputPath, data);
return `Archive saved: ${outputPath} (${Math.round(data.length / 1024)}KB, MHTML)`;
} catch (err: any) {
@@ -0,0 +1,95 @@
import { describe, test, expect, beforeEach } from 'bun:test';
import type { Page } from 'playwright';
import {
__testInternals,
undoModification,
} from '../src/cdp-inspector';
// Regression tests for the modificationHistory cap (D6 / smoking gun #2).
// Pre-cap, the module-scoped array grew unbounded across the session. Cap is
// 200 entries, oldest evicted on push past the cap. undoModification reports
// "evicted at the cap" in the error message so a user who asks for a
// no-longer-available index understands what happened (instead of seeing the
// pre-cap "No modification at index 500" with no context).
const { pushModification, MOD_HISTORY_CAP, getRawHistory, getTotalPushed, resetForTest } = __testInternals;
function fakeMod(id: number) {
return {
selector: `#node-${id}`,
property: 'color',
oldValue: 'red',
newValue: 'blue',
source: 'inline' as const,
timestamp: id,
method: 'setProperty' as 'setProperty',
};
}
beforeEach(() => {
resetForTest();
});
describe('modificationHistory cap', () => {
test('1. push under cap keeps every entry', () => {
for (let i = 0; i < 50; i++) pushModification(fakeMod(i));
expect(getRawHistory().length).toBe(50);
expect(getTotalPushed()).toBe(50);
expect(getRawHistory()[0].timestamp).toBe(0);
expect(getRawHistory()[49].timestamp).toBe(49);
});
test('2. push exactly cap keeps every entry', () => {
for (let i = 0; i < MOD_HISTORY_CAP; i++) pushModification(fakeMod(i));
expect(getRawHistory().length).toBe(MOD_HISTORY_CAP);
expect(getTotalPushed()).toBe(MOD_HISTORY_CAP);
expect(getRawHistory()[0].timestamp).toBe(0);
});
test('3. push past cap evicts oldest, keeps length at cap', () => {
const total = MOD_HISTORY_CAP + 50;
for (let i = 0; i < total; i++) pushModification(fakeMod(i));
expect(getRawHistory().length).toBe(MOD_HISTORY_CAP);
expect(getTotalPushed()).toBe(total);
// Oldest 50 dropped — entry that was #0 is gone; new oldest is #50.
expect(getRawHistory()[0].timestamp).toBe(50);
expect(getRawHistory()[MOD_HISTORY_CAP - 1].timestamp).toBe(total - 1);
});
test('4. resetForTest clears both buffer and totalPushed', () => {
for (let i = 0; i < 10; i++) pushModification(fakeMod(i));
resetForTest();
expect(getRawHistory().length).toBe(0);
expect(getTotalPushed()).toBe(0);
});
});
describe('undoModification eviction-aware error', () => {
// Stub Page: undoModification throws before any await when idx is out of
// range, so the stub never actually gets called.
const stubPage = {} as unknown as Page;
test('5. out-of-range BEFORE any eviction → no evicted note', async () => {
for (let i = 0; i < 5; i++) pushModification(fakeMod(i));
await expect(undoModification(stubPage, 99)).rejects.toThrow(
'No modification at index 99. History has 5 entries.',
);
});
test('6. out-of-range AFTER eviction → message names the evicted count', async () => {
const total = MOD_HISTORY_CAP + 73;
for (let i = 0; i < total; i++) pushModification(fakeMod(i));
// 273 pushed, 200 in buffer, 73 evicted. Ask for idx=400 (above buffer).
await expect(undoModification(stubPage, 400)).rejects.toThrow(
`No modification at index 400. History has ${MOD_HISTORY_CAP} entries ` +
`(most recent ${MOD_HISTORY_CAP} only — 73 earlier entries evicted at the cap).`,
);
});
test('7. negative explicit index throws cleanly (no NaN propagation)', async () => {
for (let i = 0; i < 10; i++) pushModification(fakeMod(i));
await expect(undoModification(stubPage, -1)).rejects.toThrow(
'No modification at index -1.',
);
});
});
+171
View File
@@ -0,0 +1,171 @@
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import type { Page } from 'playwright';
import { withCdpSession, getOrCreateCdpSession } from '../src/cdp-bridge';
// Static-grep tripwire + behavior tests for the CDP session lifecycle
// helpers introduced as part of the D11 EXPAND_SCOPE memory-leak fix.
//
// Direct calls to `page.context().newCDPSession(page)` are the leak class
// the helpers exist to close — every direct call needs a matching
// `session.detach()` and forgetting it leaves the Chromium-side target
// attached until the underlying transport drops. The tripwire fails CI
// if any source file calls `newCDPSession(` outside `cdp-bridge.ts`
// (the file that owns the helpers).
//
// Pattern mirrors browse/test/terminal-agent-pid-identity.test.ts and
// browse/test/server-sanitize-surrogates.test.ts: read source files
// directly, assert an invariant on their contents.
const SRC_DIR = path.resolve(new URL(import.meta.url).pathname, '..', '..', 'src');
function readAllSourceFiles(): Array<{ file: string; content: string }> {
const out: Array<{ file: string; content: string }> = [];
for (const entry of fs.readdirSync(SRC_DIR)) {
if (!entry.endsWith('.ts')) continue;
const full = path.join(SRC_DIR, entry);
out.push({ file: entry, content: fs.readFileSync(full, 'utf-8') });
}
return out;
}
describe('CDP session cleanup invariant', () => {
test('1. no source file calls `newCDPSession(` outside cdp-bridge.ts', () => {
const offenders: Array<{ file: string; line: number; text: string }> = [];
for (const { file, content } of readAllSourceFiles()) {
// The helper file is the ONE allowed home for direct newCDPSession calls.
if (file === 'cdp-bridge.ts') continue;
const lines = content.split('\n');
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
if (!/newCDPSession\s*\(/.test(line)) continue;
// Skip comment lines — documentation mentions are fine.
const trimmed = line.trim();
if (trimmed.startsWith('//') || trimmed.startsWith('*')) continue;
offenders.push({ file, line: i + 1, text: trimmed });
}
}
if (offenders.length > 0) {
const formatted = offenders
.map((o) => ` ${o.file}:${o.line} ${o.text}`)
.join('\n');
throw new Error(
`Direct newCDPSession(...) calls found outside cdp-bridge.ts. ` +
`Route through withCdpSession() (one-shot, finally-detach) or ` +
`getOrCreateCdpSession() (cached, close-detach) instead:\n${formatted}`,
);
}
expect(offenders).toEqual([]);
});
test('2. helper file exports the two documented entry points', () => {
// Sanity: the tripwire is meaningless if the helpers themselves are gone.
expect(typeof withCdpSession).toBe('function');
expect(typeof getOrCreateCdpSession).toBe('function');
});
});
describe('withCdpSession finally-detach', () => {
// Fake Page surface for unit-testing the helper without spinning up a real
// browser. The helper only touches page.context().newCDPSession(page) and
// the returned session's .detach(), so this surface is enough.
function makeFakePage(detachSpy: { called: number; rejected?: Error }) {
const session = {
detach: async () => {
detachSpy.called++;
if (detachSpy.rejected) throw detachSpy.rejected;
},
};
return {
context: () => ({
newCDPSession: async (_p: unknown) => session,
}),
} as unknown as Page;
}
test('3. detaches on the success path', async () => {
const detachSpy = { called: 0 };
const page = makeFakePage(detachSpy);
const result = await withCdpSession(page, async (session) => {
expect(session).toBeDefined();
return 42;
});
expect(result).toBe(42);
expect(detachSpy.called).toBe(1);
});
test('4. detaches even when fn throws (the actual leak fix)', async () => {
const detachSpy = { called: 0 };
const page = makeFakePage(detachSpy);
await expect(
withCdpSession(page, async () => {
throw new Error('boom');
}),
).rejects.toThrow('boom');
expect(detachSpy.called).toBe(1);
});
test('5. swallows detach errors so they do not mask fn errors', async () => {
const detachSpy = { called: 0, rejected: new Error('already detached') };
const page = makeFakePage(detachSpy);
await expect(
withCdpSession(page, async () => {
throw new Error('original');
}),
).rejects.toThrow('original');
expect(detachSpy.called).toBe(1);
});
test('6. swallows detach errors on the success path too', async () => {
const detachSpy = { called: 0, rejected: new Error('target closed') };
const page = makeFakePage(detachSpy);
const result = await withCdpSession(page, async () => 'ok');
expect(result).toBe('ok');
expect(detachSpy.called).toBe(1);
});
});
describe('getOrCreateCdpSession close-detach', () => {
function makeFakePage() {
const closeListeners: Array<() => void> = [];
const session = {
detach: async () => {
session._detachCount++;
},
_detachCount: 0,
};
const page = {
context: () => ({
newCDPSession: async (_p: unknown) => session,
}),
once: (event: string, fn: () => void) => {
if (event === 'close') closeListeners.push(fn);
},
_fireClose: () => {
for (const fn of closeListeners) fn();
},
};
return { page: page as unknown as Page, session, fireClose: page._fireClose };
}
test('7. caches the session across calls', async () => {
const { page } = makeFakePage();
const cache = new WeakMap<Page, any>();
const s1 = await getOrCreateCdpSession(page, cache);
const s2 = await getOrCreateCdpSession(page, cache);
expect(s1).toBe(s2);
});
test('8. close hook detaches the session AND clears the cache', async () => {
const { page, session, fireClose } = makeFakePage();
const cache = new WeakMap<Page, any>();
await getOrCreateCdpSession(page, cache);
expect(cache.get(page)).toBeDefined();
fireClose();
// Detach runs synchronously up to the await in the close hook; let it settle.
await new Promise((r) => setTimeout(r, 0));
expect(cache.get(page)).toBeUndefined();
expect(session._detachCount).toBe(1);
});
});
+247
View File
@@ -0,0 +1,247 @@
import { describe, test, expect } from 'bun:test';
import { formatBytes, type MemorySnapshot, type MemoryStructureStats } from '../src/memory-snapshot';
// Unit coverage for the $B memory diagnostic surface — formatter, byte
// renderer, and the structures-stats aggregator. The integration path
// ($B memory through the BrowserManager → CDP) requires a real headless
// Chromium and is covered indirectly by browse-basic in the eval suite.
// These tests pin the renderer logic in isolation so format regressions
// (rounded GB drift, missing "and N more" tail, snapshot.notes ordering)
// surface immediately.
// ─── formatBytes() ─────────────────────────────────────────────
describe('formatBytes', () => {
test('1. < 1 KB renders as bytes', () => {
expect(formatBytes(0)).toBe('0 B');
expect(formatBytes(1)).toBe('1 B');
expect(formatBytes(1023)).toBe('1023 B');
});
test('2. KB tier (1024 ... 1024^2-1)', () => {
expect(formatBytes(1024)).toBe('1.0 KB');
expect(formatBytes(1536)).toBe('1.5 KB');
expect(formatBytes(1024 * 1024 - 1)).toMatch(/^1024\.0 KB$|^1023\.\d KB$/);
});
test('3. MB tier', () => {
expect(formatBytes(1024 * 1024)).toBe('1.0 MB');
expect(formatBytes(312 * 1024 * 1024)).toBe('312.0 MB');
});
test('4. GB tier renders with 2 decimals', () => {
expect(formatBytes(1024 * 1024 * 1024)).toBe('1.00 GB');
expect(formatBytes(1.4 * 1024 * 1024 * 1024)).toMatch(/^1\.40 GB$/);
// 160.61 GB — the friend's OOM number from the original screenshot.
// Verify the renderer doesn't blow up at the actual leak scale.
const big = 160.61 * 1024 * 1024 * 1024;
expect(formatBytes(big)).toMatch(/^160\.6\d GB$/);
});
test('5. negative input behavior — coerces to bytes path (best-effort, do not throw)', () => {
// Diagnostic should never crash on a weird CDP reading; render
// something reasonable.
expect(() => formatBytes(-1)).not.toThrow();
});
});
// ─── handleMemoryCommand text + json output ────────────────────
// Build a minimal MemorySnapshot fixture exercising every render branch.
// This is what bm.getMemorySnapshot would return; we stub the BrowserManager
// so the test never spins up real Chromium.
function makeStructureStats(): MemoryStructureStats {
return {
modificationHistory: { current: 42, cap: 200, evicted: 0 },
activitySubscribers: 1,
inspectorSubscribers: 0,
consoleBufferLen: 1842,
networkBufferLen: 12000,
dialogBufferLen: 3,
captureBufferBytes: 0,
};
}
function makeSnapshot(overrides: Partial<MemorySnapshot> = {}): MemorySnapshot {
return {
bunServer: {
rss: 312 * 1024 * 1024,
heapUsed: 84 * 1024 * 1024,
heapTotal: 120 * 1024 * 1024,
external: 21 * 1024 * 1024,
},
tabs: [],
processes: null,
structures: makeStructureStats(),
capturedAt: 1700000000000,
notes: [],
...overrides,
};
}
// Mock BrowserManager surface for handleMemoryCommand. Only
// getMemorySnapshot is touched.
function makeFakeBm(snapshot: MemorySnapshot) {
return {
getMemorySnapshot: async (structures: MemoryStructureStats) => ({
...snapshot,
structures,
}),
} as unknown as import('../src/browser-manager').BrowserManager;
}
describe('handleMemoryCommand', () => {
test('6. --json mode emits parseable JSON with bunServer + structures', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const snapshot = makeSnapshot();
const result = await handleMemoryCommand(['--json'], makeFakeBm(snapshot));
const parsed = JSON.parse(result);
expect(parsed.bunServer.rss).toBe(312 * 1024 * 1024);
expect(parsed.structures).toBeDefined();
expect(parsed.structures.modificationHistory.cap).toBe(200);
});
test('7. text mode renders Bun server line with RSS + heap', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot()));
expect(result).toContain('Bun server:');
expect(result).toContain('312.0 MB');
expect(result).toContain('84.0 MB');
});
test('8. text mode renders "no tabs tracked" when tabs array is empty', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot({ tabs: [] })));
expect(result).toContain('Renderers:');
expect(result).toContain('(no tabs tracked)');
});
test('9. text mode shows top 10 tabs + "...and N more" tail when > 10', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const tabs = Array.from({ length: 15 }, (_, i) => ({
id: i,
url: `https://example.com/tab${i}`,
title: `Tab ${i}`,
jsHeapUsed: (15 - i) * 50 * 1024 * 1024, // descending so sort matters
jsHeapTotal: (15 - i) * 60 * 1024 * 1024,
documents: 1,
nodes: 100,
listeners: 10,
}));
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot({ tabs })));
expect(result).toContain('Renderers: 15 tabs');
expect(result).toContain('and 5 more');
// Sorted by JS heap descending — tab 0 (largest) should appear before tab 9
expect(result.indexOf('tab #0 —')).toBeLessThan(result.indexOf('tab #9 —'));
});
test('10. text mode renders Chromium processes grouped by type', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const snapshot = makeSnapshot({
processes: [
{ id: 1, type: 'browser', cpuTime: 1.5 },
{ id: 2, type: 'renderer', cpuTime: 3.2 },
{ id: 3, type: 'renderer', cpuTime: 2.1 },
{ id: 4, type: 'gpu', cpuTime: 0.5 },
],
});
const result = await handleMemoryCommand([], makeFakeBm(snapshot));
expect(result).toContain('Chromium processes: 4 total');
expect(result).toContain('renderer=2');
expect(result).toContain('browser=1');
expect(result).toContain('gpu=1');
});
test('11. text mode renders "unavailable" line when processes is null', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot({ processes: null })));
expect(result).toContain('Chromium processes: (unavailable — see notes)');
});
test('12. text mode renders modificationHistory with evicted-count when > 0', async () => {
// formatSnapshotText is what we're really testing here — exercise it
// directly with a known snapshot so the live collectStructureStats
// doesn't override the fixture values.
const mod = await import('../src/memory-command');
// formatSnapshotText is private; reach via re-rendering through
// --json mode then visually validating the JSON shape. The text-mode
// renderer is exercised by test 13 below with live (zero) values.
const stats = makeStructureStats();
stats.modificationHistory = { current: 200, cap: 200, evicted: 47 };
// Synthesize a "would-render" snapshot to assert the eviction note shape.
const renderedExpected =
'modificationHistory: 200 / 200 entries (47 evicted since reset)';
// Since formatSnapshotText isn't exported, validate the format
// contract by re-implementing the line and asserting our expectation
// matches the canonical format. This pins the user-visible string
// shape — a renderer change to drop the "evicted since reset" suffix
// would fail this assertion.
const evicted = stats.modificationHistory.evicted;
const current = stats.modificationHistory.current;
const cap = stats.modificationHistory.cap;
const expected =
`modificationHistory: ${current} / ${cap} entries` +
(evicted > 0 ? ` (${evicted} evicted since reset)` : '');
expect(expected).toBe(renderedExpected);
void mod;
});
test('13. text mode renders modificationHistory line shape', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot()));
// collectStructureStats reads live module state; values may be 0 in
// the test env. Verify the LINE SHAPE rather than specific numbers.
expect(result).toMatch(/modificationHistory:\s+\d+ \/ \d+ entries/);
});
test('14. text mode prints notes section when notes are present', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const snapshot = makeSnapshot({
notes: ['Per-Chromium-process RSS not collected — CDP limitation.'],
});
const result = await handleMemoryCommand([], makeFakeBm(snapshot));
expect(result).toContain('Notes:');
expect(result).toContain('CDP limitation.');
});
test('15. text mode omits notes section when notes is empty', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot({ notes: [] })));
expect(result).not.toContain('Notes:');
});
test('16. text mode truncates long tab URLs with ellipsis', async () => {
const { handleMemoryCommand } = await import('../src/memory-command');
const longUrl = 'https://example.com/' + 'a'.repeat(120);
const tabs = [{
id: 1,
url: longUrl,
title: 'long',
jsHeapUsed: 1024,
jsHeapTotal: 2048,
documents: 1,
nodes: 10,
listeners: 1,
}];
const result = await handleMemoryCommand([], makeFakeBm(makeSnapshot({ tabs })));
expect(result).toContain('...');
// The truncated URL appears, the full URL does not
expect(result.includes(longUrl)).toBe(false);
});
});
// ─── buildMemorySnapshotJson — server-endpoint entry ──────────
describe('buildMemorySnapshotJson', () => {
test('17. returns the snapshot with structures populated', async () => {
const { buildMemorySnapshotJson } = await import('../src/memory-command');
const snapshot = makeSnapshot();
const result = await buildMemorySnapshotJson(makeFakeBm(snapshot));
expect(result.bunServer.rss).toBe(snapshot.bunServer.rss);
expect(result.structures.modificationHistory.cap).toBe(200);
// structures is populated from live module accessors, not from the
// fixture. Just assert the shape is right.
expect(typeof result.structures.consoleBufferLen).toBe('number');
expect(typeof result.structures.networkBufferLen).toBe('number');
});
});
+132
View File
@@ -0,0 +1,132 @@
import { describe, test, expect } from 'bun:test';
import { BrowserManager } from '../src/browser-manager';
import { networkBuffer } from '../src/buffers';
// Reproducer for the body-materialization leak fixed in the D10
// USE_CDP_EVENT_BATCHED commit. Pre-fix, the wirePageEvents
// `requestfinished` listener called `await res.body()` just to read
// `.length`, allocating the full response body into a Bun Buffer on
// every request — multi-GB/hour of churn on long-lived headed
// Chromium with media-heavy pages.
//
// What this test pins:
// - The handler calls Playwright's structured req.sizes() API
// (which pulls from Network.loadingFinished without
// materializing the body).
// - The handler NEVER calls res.body(), even though a fake response
// exposes the method.
// - networkBuffer entries are still populated with the right size.
//
// What this test does NOT cover:
// - A real Chromium burst measuring peak Bun RSS during concurrent
// fetches. That's a periodic-tier test (browse/test/
// memory-leak-reproducer-e2e.test.ts, deferred — see TODOS).
// - Per-tab JS heap growth on the Chromium side. Outside Bun's
// visibility entirely.
//
// Wall clock target: < 1 second. Gate tier.
interface CallCounters {
sizes: number;
body: number;
}
function makeFakeReq(url: string, responseBodySize: number, counters: CallCounters) {
return {
url: () => url,
sizes: async () => {
counters.sizes++;
return {
requestBodySize: 0,
requestHeadersSize: 100,
responseBodySize,
responseHeadersSize: 200,
};
},
method: () => 'GET',
response: async () => ({
url: () => url,
status: () => 200,
body: async () => {
// If THIS runs, the leak is back. Allocate a real Buffer so a
// future reviewer reading the failing assertion sees what
// pre-fix code was doing on every request.
counters.body++;
return Buffer.alloc(responseBodySize);
},
}),
};
}
interface ListenerMap {
[event: string]: Array<(arg: unknown) => void>;
}
function makeFakePage() {
const listeners: ListenerMap = {};
return {
on(event: string, fn: (arg: unknown) => void): void {
(listeners[event] ||= []).push(fn);
},
emit(event: string, arg: unknown): void {
for (const fn of listeners[event] || []) fn(arg);
},
listenerCount(event: string): number {
return (listeners[event] || []).length;
},
};
}
describe('memory-leak reproducer: requestfinished does not materialize bodies', () => {
test('burst of 200 requestfinished events calls req.sizes() but never res.body()', async () => {
const bm = new BrowserManager();
const page = makeFakePage();
// wirePageEvents is private — access via the same indexed pattern the
// tab-guardrail test uses to drive private methods.
const wirePageEvents = (
bm as unknown as { wirePageEvents: (p: unknown) => void }
).wirePageEvents.bind(bm);
wirePageEvents(page);
// Seed networkBuffer with 200 request entries via the existing
// page.on('request') handler so the requestfinished backward-scan
// has something to match against.
const startLen = networkBuffer.length;
for (let i = 0; i < 200; i++) {
page.emit('request', {
url: () => `https://example.invalid/asset/${i}`,
method: () => 'GET',
});
}
// Fire 200 requestfinished events concurrently. Each notional response
// is 1 MB — pre-fix this would allocate 200 MB of Buffer. With the fix,
// not one byte of body content is allocated.
const counters: CallCounters = { sizes: 0, body: 0 };
const reqs = Array.from({ length: 200 }, (_, i) =>
makeFakeReq(`https://example.invalid/asset/${i}`, 1024 * 1024, counters),
);
for (const req of reqs) page.emit('requestfinished', req);
// Drain the async handler chain — wirePageEvents.requestfinished is
// async; each emit kicks off a microtask that awaits req.sizes().
await new Promise((r) => setTimeout(r, 50));
// One more tick in case of cascading microtasks.
await new Promise((r) => setTimeout(r, 0));
// Every event hit req.sizes().
expect(counters.sizes).toBeGreaterThanOrEqual(200);
// The actual leak fix: res.body() is NEVER called.
expect(counters.body).toBe(0);
// And the size data still made it into networkBuffer.
const populated = Array.from({ length: networkBuffer.length }, (_, i) =>
networkBuffer.get(i),
)
.filter((e) => e && e.url?.startsWith('https://example.invalid/asset/'))
.filter((e) => typeof e?.size === 'number' && e.size > 0).length;
expect(populated).toBeGreaterThanOrEqual(200);
// Sanity: the seed didn't double-count from a previous run.
expect(networkBuffer.length).toBeGreaterThan(startLen);
});
});
+35 -7
View File
@@ -113,17 +113,45 @@ describe('sanitizeLoneSurrogates — wiring invariants', () => {
expect(SERVER_SRC).toContain('result: sanitizeLoneSurrogates(cr.result)');
});
test('SSE activity feed sanitizes outbound frames via sanitizeReplacer', () => {
// Replacer must run DURING stringify; post-stringify regex is ineffective
// because JSON.stringify converts \uD800 → "\\ud800" before our regex sees it.
expect(SERVER_SRC).toContain('JSON.stringify(entry, sanitizeReplacer)');
test('SSE activity feed routes outbound frames through createSseEndpoint', () => {
// v1.51 refactor: /activity/stream no longer inlines its own
// ReadableStream/sanitizer wiring; it routes through createSseEndpoint
// which applies sanitizeReplacer to every JSON.stringify. The grep
// pins both halves of the contract: the endpoint uses the helper,
// and the helper does the sanitization.
const activityBlock = SERVER_SRC.match(
/if \(url\.pathname === '\/activity\/stream'\)[\s\S]*?createSseEndpoint\(/,
);
expect(activityBlock).not.toBeNull();
});
test('SSE inspector stream sanitizes outbound frames via sanitizeReplacer', () => {
expect(SERVER_SRC).toContain('JSON.stringify(event, sanitizeReplacer)');
test('SSE inspector stream routes outbound frames through createSseEndpoint', () => {
// Same v1.51 refactor invariant for /inspector/events.
const inspectorBlock = SERVER_SRC.match(
/if \(url\.pathname === '\/inspector\/events'[\s\S]*?createSseEndpoint\(/,
);
expect(inspectorBlock).not.toBeNull();
});
test('sanitizeReplacer is a function defined in server.ts', () => {
test('createSseEndpoint applies sanitizeReplacer to every JSON.stringify', () => {
// The helper is the single source of truth for SSE sanitization now.
// If a future refactor moves stringify off the replacer (e.g. someone
// adds a fast-path encode), this test fails and the surrogate-escape
// class regresses across every SSE endpoint at once.
const helperPath = path.resolve(import.meta.dir, '..', 'src', 'sse-helpers.ts');
const helperSrc = fs.readFileSync(helperPath, 'utf-8');
expect(helperSrc).toContain('JSON.stringify(');
expect(helperSrc).toContain('sanitizeReplacer');
// The sanitizer itself uses stripLoneSurrogates (the shared utility in
// sanitize.ts) — not a private copy. Re-confirms the helper is wired
// to the canonical sanitizer, not a drift'd duplicate.
expect(helperSrc).toContain("import { stripLoneSurrogates } from './sanitize'");
});
test('sanitizeReplacer is a function defined in server.ts (for non-SSE egress)', () => {
// server.ts keeps its own sanitizeReplacer for the non-SSE JSON egress
// paths (handleCommandInternal etc.). The SSE path uses sse-helpers.ts's
// own sanitizeReplacer; both must exist independently.
expect(SERVER_SRC).toContain('function sanitizeReplacer(');
});
});
+194
View File
@@ -0,0 +1,194 @@
import { describe, test, expect } from 'bun:test';
import { createSseEndpoint } from '../src/sse-helpers';
// Unit tests for the SSE cleanup contract introduced by D6 EXTRACT_HELPER.
//
// The pre-helper bug: /activity/stream and /inspector/events ran cleanup
// only on the `req.signal.abort` edge. If the underlying TCP died without
// firing abort (Chromium MV3 service-worker suspend, intermediate proxy
// half-close), the subscriber closure stayed in the Set capturing the
// ReadableStreamDefaultController and any payloads queued behind it.
//
// These tests pin the three cleanup edges:
// 1. abort signal → cleanup
// 2. enqueue throws (consumer gone) → cleanup
// 3. heartbeat enqueue throws → cleanup
// And the idempotency invariant: cleanup running twice is a no-op.
function makeRequest(): { req: Request; abort: () => void } {
const controller = new AbortController();
// Minimal Request — we only use req.signal here. URL is irrelevant.
const req = new Request('http://localhost/test', { signal: controller.signal });
return { req, abort: () => controller.abort() };
}
/** Pull SSE bytes from a Response stream, return decoded text. */
async function readAll(res: Response, ms: number): Promise<string> {
if (!res.body) return '';
const reader = res.body.getReader();
const decoder = new TextDecoder();
let out = '';
const deadline = Date.now() + ms;
while (Date.now() < deadline) {
try {
const { value, done } = await Promise.race([
reader.read(),
new Promise<{ value: undefined; done: true }>((resolve) =>
setTimeout(() => resolve({ value: undefined, done: true }), deadline - Date.now()),
),
]);
if (done) break;
if (value) out += decoder.decode(value, { stream: true });
} catch {
break;
}
}
try { reader.cancel().catch(() => {}); } catch {}
return out;
}
describe('createSseEndpoint cleanup contract', () => {
test('1. abort signal triggers unsubscribe', async () => {
let unsubscribed = 0;
const { req, abort } = makeRequest();
const res = createSseEndpoint(req, {
subscribe: () => () => {
unsubscribed++;
},
liveEventName: 'test',
heartbeatMs: 60_000, // long enough that we don't see heartbeats in this test
});
// Start the stream by reading once, then abort.
const reader = res.body!.getReader();
// Yield to let start() run.
await Promise.resolve();
await Promise.resolve();
abort();
// Let the abort listener fire.
await new Promise((r) => setTimeout(r, 10));
expect(unsubscribed).toBe(1);
reader.cancel().catch(() => {});
});
test('2. enqueue throw triggers unsubscribe + heartbeat clear', async () => {
let unsubscribed = 0;
let notify: ((entry: { msg: string }) => void) | null = null;
const { req } = makeRequest();
const res = createSseEndpoint<{ msg: string }>(req, {
subscribe: (n) => {
notify = n;
return () => {
unsubscribed++;
};
},
liveEventName: 'test',
heartbeatMs: 60_000,
});
// Cancel the reader so subsequent enqueues throw.
const reader = res.body!.getReader();
await Promise.resolve();
await Promise.resolve();
expect(notify).not.toBeNull();
await reader.cancel(); // closes the consumer side
// Now fire a live event — enqueue should throw → cleanup → unsubscribe.
notify!({ msg: 'will fail to enqueue' });
await new Promise((r) => setTimeout(r, 10));
expect(unsubscribed).toBe(1);
});
test('3. cleanup is idempotent (abort then enqueue-fail)', async () => {
let unsubscribed = 0;
let notify: ((entry: { msg: string }) => void) | null = null;
const { req, abort } = makeRequest();
const res = createSseEndpoint<{ msg: string }>(req, {
subscribe: (n) => {
notify = n;
return () => {
unsubscribed++;
};
},
liveEventName: 'test',
heartbeatMs: 60_000,
});
const reader = res.body!.getReader();
await Promise.resolve();
await Promise.resolve();
abort();
await new Promise((r) => setTimeout(r, 10));
// Second cleanup edge — should be a no-op.
notify!({ msg: 'no-op' });
await new Promise((r) => setTimeout(r, 10));
expect(unsubscribed).toBe(1);
reader.cancel().catch(() => {});
});
test('4. initialReplay events reach the client before live events', async () => {
let notify: ((entry: { msg: string }) => void) | null = null;
const { req } = makeRequest();
const res = createSseEndpoint<{ msg: string }>(req, {
initialReplay: (send) => {
send('replay', { msg: 'first' });
},
subscribe: (n) => {
notify = n;
return () => {};
},
liveEventName: 'live',
heartbeatMs: 60_000,
});
// Trigger one live event soon after stream starts.
setTimeout(() => notify?.({ msg: 'second' }), 5);
const text = await readAll(res, 50);
expect(text).toContain('event: replay');
expect(text).toContain('"msg":"first"');
expect(text).toContain('event: live');
expect(text).toContain('"msg":"second"');
// Replay must come before live.
expect(text.indexOf('"first"')).toBeLessThan(text.indexOf('"second"'));
});
test('5. initialReplay throw triggers cleanup without subscribing', async () => {
let subscribed = 0;
const { req } = makeRequest();
const res = createSseEndpoint(req, {
initialReplay: () => {
throw new Error('replay boom');
},
subscribe: () => {
subscribed++;
return () => {};
},
liveEventName: 'test',
heartbeatMs: 60_000,
});
// Drain — stream should close cleanly.
const text = await readAll(res, 30);
expect(text).toBe(''); // no events
expect(subscribed).toBe(0); // never reached subscribe()
});
test('6. lone surrogates in payload string are sanitized', async () => {
let notify: ((entry: { msg: string }) => void) | null = null;
const { req } = makeRequest();
const res = createSseEndpoint<{ msg: string }>(req, {
subscribe: (n) => {
notify = n;
return () => {};
},
liveEventName: 'test',
heartbeatMs: 60_000,
});
setTimeout(() => {
// Lone high surrogate (no matching low). JSON.stringify would emit
// \uD800 escape that breaks Claude API. Helper must strip it.
notify?.({ msg: 'hello \uD800 world' });
}, 5);
const text = await readAll(res, 50);
expect(text).toContain('event: test');
// JSON.stringify emits U+FFFD as the literal character, not as escape.
expect(text).toContain('');
// The raw lone-surrogate escape MUST NOT survive — that's the failure
// mode that breaks the Claude API with HTTP 400.
expect(text.toLowerCase()).not.toContain('\\ud800');
});
});
+118
View File
@@ -0,0 +1,118 @@
import { describe, test, expect, beforeEach } from 'bun:test';
import { BrowserManager } from '../src/browser-manager';
import { subscribe } from '../src/activity';
// Tests for the tab-count guardrail. Each threshold fires exactly once per
// upward crossing and re-arms when the count drops back below. The toast
// UX lives in the sidebar; this exercises the server-side audit-trail
// invariant that an activity entry is emitted at each crossing.
interface CapturedEntry {
type: string;
command?: string;
error?: string;
tabs?: number;
}
function captureGuardrailEntries(): { entries: CapturedEntry[]; unsubscribe: () => void } {
const entries: CapturedEntry[] = [];
const unsubscribe = subscribe((entry) => {
if (entry.command === 'tab-guardrail') {
entries.push({
type: entry.type,
command: entry.command,
error: entry.error,
tabs: entry.tabs,
});
}
});
return { entries, unsubscribe };
}
/** Drive the guardrail by writing directly into the manager's pages map. */
async function setTabCount(bm: BrowserManager, n: number): Promise<void> {
// Reach into private state via index access — test-only manipulation that
// avoids spinning up a real Chromium just to verify the threshold math.
const inner = bm as unknown as {
pages: Map<number, unknown>;
checkTabGuardrails: () => void;
recheckTabGuardrailsOnClose: () => void;
};
inner.pages.clear();
for (let i = 0; i < n; i++) inner.pages.set(i, { fakeTab: true });
// Drive whichever direction matches the count change.
inner.checkTabGuardrails();
inner.recheckTabGuardrailsOnClose();
// emitActivity dispatches subscribers via queueMicrotask, so let the
// microtask queue drain before the test assertion runs.
await new Promise((r) => setTimeout(r, 0));
}
describe('tab-count guardrail', () => {
let bm: BrowserManager;
let capture: ReturnType<typeof captureGuardrailEntries>;
beforeEach(() => {
bm = new BrowserManager();
capture = captureGuardrailEntries();
});
test('1. no entry fires under the soft threshold', async () => {
await setTabCount(bm, 10);
await setTabCount(bm, 49);
expect(capture.entries).toEqual([]);
capture.unsubscribe();
});
test('2. soft threshold (50) fires exactly once on upward crossing', async () => {
await setTabCount(bm, 49);
await setTabCount(bm, 50);
await setTabCount(bm, 51);
await setTabCount(bm, 60);
expect(capture.entries.length).toBe(1);
expect(capture.entries[0].tabs).toBe(50);
expect(capture.entries[0].error).toContain('crossed 50');
capture.unsubscribe();
});
test('3. hard threshold (200) fires exactly once on upward crossing', async () => {
await setTabCount(bm, 199);
await setTabCount(bm, 200);
await setTabCount(bm, 201);
await setTabCount(bm, 220);
// 0 → 199 fired the soft threshold; 199 → 200 fires the hard one once.
const hardEntries = capture.entries.filter((e) => e.error?.includes('crossed 200'));
expect(hardEntries.length).toBe(1);
expect(hardEntries[0].tabs).toBe(200);
capture.unsubscribe();
});
test('4. both thresholds fire in order when count jumps from 0 → 250', async () => {
await setTabCount(bm, 250);
expect(capture.entries.length).toBe(2);
expect(capture.entries[0].error).toContain('crossed 50');
expect(capture.entries[1].error).toContain('crossed 200');
capture.unsubscribe();
});
test('5. soft threshold re-arms when tab count drops below it', async () => {
await setTabCount(bm, 60);
expect(capture.entries.length).toBe(1);
await setTabCount(bm, 30);
await setTabCount(bm, 55);
expect(capture.entries.length).toBe(2);
expect(capture.entries[1].error).toContain('crossed 50');
capture.unsubscribe();
});
test('6. hard threshold re-arms when tab count drops below it', async () => {
await setTabCount(bm, 210);
const beforeReArm = capture.entries.filter((e) => e.error?.includes('crossed 200')).length;
expect(beforeReArm).toBe(1);
await setTabCount(bm, 150);
await setTabCount(bm, 220);
const afterReArm = capture.entries.filter((e) => e.error?.includes('crossed 200')).length;
expect(afterReArm).toBe(2);
capture.unsubscribe();
});
});
+5 -1
View File
@@ -646,7 +646,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"canary","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -649,7 +649,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"codex","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"context-restore","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -649,7 +649,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"context-save","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -652,7 +652,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"cso","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -672,7 +672,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"design-consultation","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -653,7 +653,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"design-html","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"design-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -667,7 +667,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"design-shotgun","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -652,7 +652,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"devex-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+1
View File
@@ -33,6 +33,7 @@ Detailed guides for every gstack skill — philosophy, workflow, and examples.
| [`/plan-devex-review`](#plan-devex-review) | **DX Reviewer** | Plan-stage DX review. TTHW (time-to-hello-world), magical moments, friction points, persona traces. Three modes: Expansion, Polish, Triage. |
| [`/devex-review`](#devex-review) | **DX Reviewer (live)** | Live developer experience audit. Walks the actual onboarding flow, measures TTHW, catches the docs lies. |
| [`/plan-tune`](#plan-tune) | **Question Tuner** | Self-tune AskUserQuestion sensitivity per question. Mark questions as never-ask, always-ask, or only-for-one-way. |
| [`/spec`](#spec) | **Spec Author** | Turn vague intent into a precise, executable spec in five phases. Files a GitHub issue, optionally spawns a Claude Code agent in a fresh worktree, and lets `/ship` close the source issue on merge. |
| [`/learn`](#learn) | **Memory** | Manage what gstack learned across sessions. Review, search, prune, and export project-specific patterns and preferences. |
| [`/context-save`](#context-save) | **Save State** | Save working context (git state, decisions, remaining work) so any future session can resume. |
| [`/context-restore`](#context-restore) | **Restore State** | Resume from a saved context, even across Conductor workspace handoffs. |
+193
View File
@@ -0,0 +1,193 @@
# Spike: Claude Code hook mutation for plan-tune cathedral
**Status:** complete (2026-05-27)
**Surfaces:** D10 (does PreToolUse allow mutating AUQ input?), D19/Codex (matcher must cover MCP variants)
**Downstream consumers:** T3, T5, T6, T8
## Question this spike answers
Can a PreToolUse hook on `AskUserQuestion` actually substitute the user's
answer via `updatedInput`? If yes, what's the exact protocol?
## Answer
**Yes.** `updatedInput` is the supported mechanism. Source:
https://code.claude.com/docs/en/hooks (confirmed 2026-04 reference).
## Hook stdin schema (PreToolUse + PostToolUse)
```json
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/current/working/dir",
"permission_mode": "default",
"effort": { "level": "medium" },
"hook_event_name": "PreToolUse",
"tool_name": "AskUserQuestion",
"tool_input": { /* tool-specific */ },
"tool_use_id": "unique-id-12345"
}
```
Optional in subagent context: `agent_id`, `agent_type`.
## PreToolUse hook stdout schema for `allow + updatedInput`
```json
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow",
"permissionDecisionReason": "auto-decided by plan-tune preference",
"updatedInput": { /* shallow-merged into original tool_input */ },
"additionalContext": "optional context for Claude"
}
}
```
**permissionDecision values:**
- `"allow"` — proceed, optionally with `updatedInput`
- `"deny"` — block (feedback to Claude, NOT a synthetic answer per Codex
correction in D-prefixed decisions)
- `"ask"` — escalate to user
- `"defer"` — let permission flow continue
**`updatedInput` semantics:** shallow merge of fields present in the returned
object onto the original `tool_input`. Only valid with
`permissionDecision: "allow"`. This is what lets us substitute an
auto-decided answer for `never-ask` preferences.
## Matcher schema
The `matcher` field in `~/.claude/settings.json` supports JS-regex syntax
**when it contains regex metacharacters**. A matcher with only letters/
underscores is an exact match.
To cover both native + MCP `AskUserQuestion`:
```json
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)"
```
Conductor disables native `AskUserQuestion` via `--disallowedTools` and
routes through `mcp__conductor__AskUserQuestion` — the MCP suffix is
required for our hook to fire there.
## Multiple-hook concurrency caveat
> All matching hooks run in parallel, and identical handlers are
> deduplicated automatically.
**For our use case:**
- gstack registers exactly one PreToolUse hook and one PostToolUse hook on
AUQ-shaped tool names.
- If a user has THEIR own hook that also returns `updatedInput` on
AskUserQuestion, the merge order is undefined.
- Mitigation: document this constraint in `bin/gstack-settings-hook`
install prompt. User can detect the conflict from the diff preview before
accepting.
**`permissionDecision` precedence (when multiple hooks decide):**
`deny > ask > allow > defer` — most restrictive wins.
## Implementation hookSpecificOutput examples
**Auto-decide (PreToolUse, `never-ask` preference + non-one-way):**
```json
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow",
"permissionDecisionReason": "plan-tune: never-ask preference on ship-test-failure-triage",
"updatedInput": {
"questions": [{ /* same as input, but with auto-selected answer */ }]
}
}
}
```
**Pass-through (no preference, or one-way safety override):**
```json
{
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "defer"
}
}
```
**PostToolUse capture (always):**
```json
{
"hookSpecificOutput": {
"hookEventName": "PostToolUse"
}
}
```
(PostToolUse hooks can also set `additionalContext` to append to the tool
result; we don't need this for v1 capture.)
## Settings.json snippet for T8 hook installer
```json
{
"hooks": {
"PreToolUse": [
{
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
"hooks": [
{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-preference-hook",
"timeout": 5
}
]
}
],
"PostToolUse": [
{
"matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
"hooks": [
{
"type": "command",
"command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-log-hook",
"timeout": 5
}
]
}
]
}
}
```
Hook commands take `bun` invocation under the hood; absolute paths (or
`$CLAUDE_PROJECT_DIR` substitution) are required by Claude Code's hook
runner. The hooks themselves are TypeScript files that the bash wrapper
shells into bun.
## Open questions deferred to implementation
1. **Recommended-option parsing scope.** D2 says parse `(recommended)`
label first. The label is on the option's `label` field per
AskUserQuestion Format. Implementation will need to walk `tool_input.
questions[*].options[*]` looking for the label suffix. Worked
examples: ship/SKILL.md.tmpl emits options like `"A) Fix now"
(recommended)`.
2. **Auto-decided event tagging.** When hook returns `updatedInput`, the
PostToolUse hook will see the resolved input and log a normal event.
Need an extra field on the PostToolUse payload (e.g.,
`was_auto_decided: true`) that the hook can set via session state
tracking — write a marker file in `~/.gstack/sessions/<id>/.auto-decided-<tool_use_id>`
from PreToolUse, read it from PostToolUse, delete on read.
3. **Timeout behavior.** Default hook timeout is 60s but the docs are
thin on what happens at timeout. Set explicit `timeout: 5` so the
user never waits >5s on a hook misfire. Falls back to pass-through.
## References
- https://code.claude.com/docs/en/hooks (canonical, latest as of 2026-04)
- WebSearch results 2026-05-27
- Existing `bin/gstack-settings-hook` (SessionStart-only impl, to be
superseded by T3 schema-aware rewrite)
+171
View File
@@ -0,0 +1,171 @@
# Spike: Codex session storage format for plan-tune cathedral
**Status:** complete (2026-05-27)
**Surfaces:** D5 (Codex import parses structured files, not regex)
**Downstream consumers:** T9 (gstack-codex-session-import)
## Question this spike answers
What's the actual on-disk format of Codex sessions, and how do we recover
AskUserQuestion-shaped events from it for `gstack-codex-session-import`?
## Storage layout
```
~/.codex/
├── auth.json # Codex auth (do not touch)
├── config.toml # User config
├── goals_1.sqlite # ~24KB, internal goals DB (not relevant)
├── logs_2.sqlite # ~16MB, structured logs (target=*, see schema)
├── history.jsonl # ~9KB, command history
└── sessions/
└── 2026/05/27/
└── rollout-<iso8601>-<uuid>.jsonl # per-session transcript
```
Session files: one JSONL per `codex exec` or interactive session. Cwd path
embedded in the `session_meta` event. CLI version recorded.
## Session JSONL event types (measured on Garry's machine, 2026-05-27)
| type | count | meaning |
|----------------|------:|---------|
| `response_item`| 382 | model's response stream (~76%) |
| `event_msg` | 97 | high-level session events (~19%) |
| `turn_context` | 6 | per-turn context snapshot |
| `session_meta` | 6 | session header (one per session) |
### response_item subtypes
| subtype | count | meaning |
|--------------------------|------:|---------|
| `function_call` | 148 | model invoked a tool |
| `function_call_output` | 148 | tool result returned to model |
| `reasoning` | 44 | reasoning summary |
| `message` | 40 | text message (input_text or output_text) |
| `web_search_call` | 2 | web search tool call |
### event_msg subtypes
| subtype | count | meaning |
|-------------------|------:|---------|
| `token_count` | 55 | per-step token accounting |
| `agent_message` | 22 | agent's prose output |
| `user_message` | 6 | user's prose input |
| `task_started` | 6 | task start (one per top-level task) |
| `task_complete` | 6 | task complete |
| `web_search_end` | 2 | web search completion |
## Critical finding: Codex has no `AskUserQuestion` tool
Codex doesn't surface AskUserQuestion as a tool call in `response_item`
stream. Gstack skills running on Codex emit AskUserQuestion-shaped
Decision Briefs as plain prose inside `agent_message` events (the
`AskUserQuestion Format` from preamble). The user's answer comes back in
the next `user_message`.
This means importing AUQ events from Codex sessions is structurally
different from importing them from Claude Code (where they ARE
tool calls):
- **Claude Code:** hook captures structured `tool_input`/`tool_output`
for `AskUserQuestion`. Question + options + answer all separated.
- **Codex:** parser must extract from `agent_message.text` body, detect
the D-numbered Decision Brief pattern, then match against the
subsequent `user_message` for the answer.
## Recovery strategy for `gstack-codex-session-import`
**Two-tier extraction:**
1. **Marker-first (D18 mechanism).** Search `agent_message` text for the
`<gstack-qid:foo-bar>` marker. If present, we have an exact question_id
and can reliably recover. (Will work once T14 adds markers to the top
10 registry questions and Codex starts emitting them via the
host-aware preamble path.)
2. **Pattern fallback.** When no marker, parse for:
- `D<N> — <title>` line (D-number from AskUserQuestion Format)
- `Recommendation: ...` line
- Option block `A) ...`, `B) ...`, etc.
- Next `user_message` event for the chosen option label
Use this only to populate hash-based question_id (the same
`hook-<sha1(skill+text+sorted_options)[:10]>` shape Layer 1 uses on
Claude). Tagged `source: "codex-pattern-fallback"`, never used as
preference key (per D18 hash drift guidance).
## Schema we'll write to question-log.jsonl from Codex import
Per existing `bin/gstack-question-log` schema, augmented with:
- `source: "codex-import-marker"` (when qid marker found)
- `source: "codex-import-pattern"` (when fallback regex used)
- `codex_session_id` (UUID from session_meta)
- `codex_cwd` (working dir from session_meta — disambiguates project)
- `codex_ts` (timestamp from event)
## Sqlite logs_2.sqlite schema
```sql
CREATE TABLE logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts INTEGER NOT NULL,
ts_nanos INTEGER NOT NULL,
level TEXT NOT NULL,
target TEXT NOT NULL,
feedback_log_body TEXT,
module_path TEXT,
file TEXT,
line INTEGER,
thread_id TEXT,
process_uuid TEXT,
estimated_bytes INTEGER NOT NULL DEFAULT 0
);
```
`logs_2.sqlite` is internal telemetry, not session content. **Don't use
for AUQ extraction.** Sessions JSONL is authoritative.
## Project-slug derivation
From `session_meta.payload.cwd` — derive via the existing
`bin/gstack-slug` logic on the cwd path. Conductor worktrees have their
own slug naming convention encoded in cwd; the bin already handles this.
## Versioning safety
`session_meta.payload.cli_version` records the Codex CLI version (e.g.
`0.130.0`). When the importer encounters an unknown version, log a
warning to stderr but continue — schema additions are typically
backwards-compatible in JSONL.
If `type` or `payload.type` values change in a future version, we'll see
them as `unknown` in the importer's audit log. Add a guarded
`KNOWN_VERSIONS = ["0.130.x", "0.131.x", ...]` constant in the importer
and bump explicitly when re-testing.
## Open questions for implementation
1. **Where does Codex store the "user's answer" exactly?** Need to test
with a real `codex exec` run that triggers a Decision Brief and inspect
the next event. Likely `event_msg` of subtype `user_message` or a
`response_item` of subtype `message` with `role: "user"`. Confirm
during T9 implementation.
2. **Free-text extraction for "Other".** The Decision Brief prose
doesn't structurally separate "Other" responses from named options.
Pattern fallback will need to detect "Other: <text>" wording in the
answer. T10 (dream cycle distill) only fires on this when source is
`codex-import-marker` so we can trust the data.
3. **Conductor cwd handling.** Conductor worktrees share project state
but have distinct cwds. The import should bucket events by the
project slug, not the cwd directly, so events from sibling worktrees
accumulate into the same project view.
## References
- Live inspection of `~/.codex/sessions/2026/05/*/`
- `sqlite3 ~/.codex/logs_2.sqlite ".schema"` (2026-05-27)
- Codex CLI 0.130.0 (current at spike time)
- See also: D5 cross-model tension decision in plan file.
+5 -1
View File
@@ -652,7 +652,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"document-generate","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"document-release","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+97
View File
@@ -1137,6 +1137,103 @@ footer {
transition: color 150ms;
}
.footer-port:hover { color: var(--text-label); }
.footer-mem {
color: var(--text-meta);
font-family: var(--font-mono);
font-size: 11px;
margin-right: 6px;
padding: 1px 6px;
border-radius: var(--radius-sm);
transition: color 150ms;
}
.footer-mem.warn {
color: #f59e0b;
}
.footer-mem.bad {
color: #ef4444;
}
/* ─── Memory pressure toast ─────────────────────────────────── */
.mem-toast {
position: fixed;
left: 12px;
right: 12px;
bottom: 44px;
z-index: 9999;
background: var(--bg-elevated, #1f1f23);
border: 1px solid #ef4444;
border-radius: var(--radius-md, 6px);
padding: 12px;
box-shadow: 0 8px 24px rgba(0, 0, 0, 0.4);
font-family: var(--font-sans);
font-size: 12px;
}
.mem-toast-header {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 8px;
}
.mem-toast-header strong {
color: var(--text-heading);
font-size: 13px;
}
.mem-toast-close {
background: transparent;
border: none;
color: var(--text-meta);
cursor: pointer;
font-size: 18px;
line-height: 1;
padding: 0 4px;
}
.mem-toast-close:hover { color: var(--text-heading); }
.mem-toast-body {
margin-bottom: 8px;
color: var(--text-body);
line-height: 1.4;
}
.mem-toast-body .mem-toast-row {
display: flex;
align-items: center;
gap: 8px;
padding: 4px 0;
}
.mem-toast-body .mem-toast-row label {
flex: 1;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
cursor: pointer;
}
.mem-toast-body .mem-toast-size {
font-family: var(--font-mono);
font-size: 11px;
color: var(--text-meta);
width: 70px;
text-align: right;
}
.mem-toast-actions {
display: flex;
gap: 8px;
justify-content: flex-end;
}
.mem-toast-btn {
background: var(--bg-base);
border: 1px solid var(--zinc-600);
border-radius: var(--radius-sm, 4px);
color: var(--text-body);
cursor: pointer;
font-size: 12px;
padding: 4px 12px;
}
.mem-toast-btn:hover { background: var(--zinc-700); }
.mem-toast-btn.primary {
background: #ef4444;
border-color: #ef4444;
color: #fff;
}
.mem-toast-btn.primary:hover { background: #dc2626; }
.port-input {
width: 56px;
padding: 2px 6px;
+14
View File
@@ -159,6 +159,19 @@
</div>
</main>
<!-- Tab guardrail toast (hidden until /memory poll trips a threshold) -->
<div class="mem-toast" id="mem-toast" role="dialog" aria-label="Memory pressure warning" style="display:none">
<div class="mem-toast-header">
<strong id="mem-toast-title">High memory pressure</strong>
<button class="mem-toast-close" id="mem-toast-close" aria-label="Dismiss">&times;</button>
</div>
<div class="mem-toast-body" id="mem-toast-body"></div>
<div class="mem-toast-actions">
<button class="mem-toast-btn primary" id="mem-toast-close-selected">Close selected</button>
<button class="mem-toast-btn" id="mem-toast-snooze">Snooze</button>
</div>
</div>
<!-- Footer with connection + debug toggle -->
<footer>
<div class="footer-left">
@@ -166,6 +179,7 @@
<button class="footer-btn" id="reload-sidebar" title="Reload sidebar">reload</button>
</div>
<div class="footer-right">
<span class="footer-mem" id="footer-mem" title="Process memory + tab count from $B memory (polled every 30s, paused if slow)"></span>
<span class="dot" id="footer-dot"></span>
<span class="footer-port" id="footer-port" title="Click to change port"></span>
<input type="text" class="port-input" id="port-input" placeholder="34567" autocomplete="off" style="display:none">
+295
View File
@@ -292,6 +292,294 @@ async function connectSSE() {
});
}
// ─── Memory Footer Readout ──────────────────────────────────────
//
// Polls /memory every 30s and renders "RSS: 1.4 GB · 12 tabs" in the
// footer. Backs off to 5min if a poll takes > 2s (Codex flag — diagnostic
// shouldn't add load when the browser is already unhealthy). Uses Bearer
// auth like /refs above; /memory is a plain GET so EventSource semantics
// don't apply.
const MEM_POLL_FAST_MS = 30_000;
const MEM_POLL_SLOW_MS = 5 * 60_000;
const MEM_POLL_TIMEOUT_MS = 8_000;
const MEM_POLL_SLOW_THRESHOLD_MS = 2_000;
let memPollTimer = null;
let memPollMode = 'fast'; // 'fast' | 'slow'
function fmtBytesShort(n) {
if (typeof n !== 'number' || isNaN(n)) return '?';
if (n < 1024) return n + ' B';
if (n < 1024 * 1024) return (n / 1024).toFixed(0) + ' KB';
if (n < 1024 * 1024 * 1024) return (n / 1024 / 1024).toFixed(0) + ' MB';
return (n / 1024 / 1024 / 1024).toFixed(2) + ' GB';
}
function renderMemFooter(snapshot) {
const el = document.getElementById('footer-mem');
if (!el) return;
const bunRss = snapshot?.bunServer?.rss ?? 0;
const tabCount = Array.isArray(snapshot?.tabs) ? snapshot.tabs.length : 0;
el.textContent = `${fmtBytesShort(bunRss)} · ${tabCount} tabs`;
// Color thresholds: ~2 GB Bun RSS or 50 tabs is "watch this"; ~8 GB or
// 200 tabs is "this is the cliff" (matches the 200-tab guardrail).
el.classList.remove('warn', 'bad');
if (bunRss > 8 * 1024 * 1024 * 1024 || tabCount > 200) el.classList.add('bad');
else if (bunRss > 2 * 1024 * 1024 * 1024 || tabCount > 50) el.classList.add('warn');
}
async function pollMemoryOnce() {
if (!serverUrl || !serverToken) return { ok: false, slow: false };
const start = Date.now();
try {
const resp = await fetch(`${serverUrl}/memory`, {
headers: { 'Authorization': `Bearer ${serverToken}` },
signal: AbortSignal.timeout(MEM_POLL_TIMEOUT_MS),
credentials: 'include',
});
const elapsed = Date.now() - start;
if (!resp.ok) return { ok: false, slow: elapsed > MEM_POLL_SLOW_THRESHOLD_MS };
const snapshot = await resp.json();
renderMemFooter(snapshot);
// Evaluate guardrail triggers (single-heavy-tab OR tab-count crossing 200).
// Toast is hidden when no trigger fires; snooze state suppresses re-fire.
try { evaluateMemToast(snapshot); } catch (err) {
console.debug('[gstack sidebar] mem-toast evaluation failed:', err && err.message);
}
return { ok: true, slow: elapsed > MEM_POLL_SLOW_THRESHOLD_MS };
} catch (err) {
const elapsed = Date.now() - start;
// Don't log every poll failure — common during browser restarts / restoring
// sessions. Only log on the slow path so the user sees something in the
// console if the diagnostic itself is misbehaving.
if (elapsed > MEM_POLL_SLOW_THRESHOLD_MS) {
console.debug('[gstack sidebar] /memory poll slow/failed:', elapsed, 'ms', err && err.message);
}
return { ok: false, slow: elapsed > MEM_POLL_SLOW_THRESHOLD_MS };
}
}
function scheduleNextMemPoll(delayMs) {
if (memPollTimer) clearTimeout(memPollTimer);
memPollTimer = setTimeout(async () => {
const { ok, slow } = await pollMemoryOnce();
if (!ok || slow) {
memPollMode = 'slow';
scheduleNextMemPoll(MEM_POLL_SLOW_MS);
} else {
// Successful + fast → back to fast cadence.
if (memPollMode === 'slow') memPollMode = 'fast';
scheduleNextMemPoll(MEM_POLL_FAST_MS);
}
}, delayMs);
}
function startMemPolling() {
if (memPollTimer) return; // already running
// Kick off an immediate poll so the footer populates within ~1s of sidebar
// open, instead of waiting 30s for the first cycle.
scheduleNextMemPoll(500);
}
function stopMemPolling() {
if (memPollTimer) {
clearTimeout(memPollTimer);
memPollTimer = null;
}
}
// ─── Tab guardrail toast (D5 + Codex single-tab flag) ───────
//
// Each /memory poll evaluates two trigger conditions:
// 1. Tab count crossed 200 — show "top 5 tabs by max(jsHeap, ...)" with
// Close-selected + Snooze.
// 2. Any single tab over 4 GB JS heap — show one-tab toast (catches the
// Codex case where a runaway WebGL/video page balloons one tab).
// Snooze persists in chrome.storage.session: next warn fires at tabCount +
// snoozeBumpTabs OR when a single tab crosses (snoozedJsHeapBytes + 1).
//
// "Close selected" runs $B closetab <id> via the existing /command path —
// no chrome.tabs.remove bridge needed.
const HEAVY_TAB_HEAP_BYTES = 4 * 1024 * 1024 * 1024; // 4 GB per Codex flag
const TOAST_SNOOZE_TAB_BUMP = 50; // re-warn at 200+50
const TOAST_SNOOZE_HEAP_BUMP = 2 * 1024 * 1024 * 1024;
const memToastSnooze = {
tabsAbove: 0, // suppress the count-toast until tabs strictly exceeds this
heapAbove: 0, // suppress the single-tab toast until heap strictly exceeds this
};
async function loadSnoozeState() {
if (!chrome?.storage?.session) return;
try {
const stored = await chrome.storage.session.get(['memToastSnooze']);
if (stored?.memToastSnooze) {
memToastSnooze.tabsAbove = stored.memToastSnooze.tabsAbove | 0;
memToastSnooze.heapAbove = stored.memToastSnooze.heapAbove | 0;
}
} catch (err) {
console.debug('[gstack sidebar] mem-toast snooze load failed:', err && err.message);
}
}
async function saveSnoozeState() {
if (!chrome?.storage?.session) return;
try {
await chrome.storage.session.set({ memToastSnooze: { ...memToastSnooze } });
} catch (err) {
console.debug('[gstack sidebar] mem-toast snooze save failed:', err && err.message);
}
}
function dismissMemToast() {
const toast = document.getElementById('mem-toast');
if (toast) toast.style.display = 'none';
}
/**
* Sort key for "RAM-heavy" tabs. JS heap × 4 is a rough proxy for total
* tab footprint (renderers tend to spend ~4× their JS heap on native +
* Skia + cache); when a tab is heavy via WebGL/video the JS heap is
* small but listeners/nodes spike. Take the max.
*/
function tabRamScore(tab) {
const heap = tab?.jsHeapUsed || 0;
const nodes = tab?.nodes || 0;
const listeners = tab?.listeners || 0;
// ~1 KB per DOM node + ~200 bytes per listener as a back-of-envelope
// native-memory estimate. Keeps the sort meaningful when JS heap is small.
const nativeEstimate = nodes * 1024 + listeners * 200;
return Math.max(heap, nativeEstimate);
}
function showMemToast(title, body, tabsForClose) {
const toast = document.getElementById('mem-toast');
const titleEl = document.getElementById('mem-toast-title');
const bodyEl = document.getElementById('mem-toast-body');
const closeBtn = document.getElementById('mem-toast-close-selected');
if (!toast || !titleEl || !bodyEl || !closeBtn) return;
titleEl.textContent = title;
bodyEl.innerHTML = '';
for (const t of tabsForClose) {
const row = document.createElement('div');
row.className = 'mem-toast-row';
const cb = document.createElement('input');
cb.type = 'checkbox';
cb.id = `mem-toast-tab-${t.id}`;
cb.value = String(t.id);
cb.checked = true; // default-selected so a fast user just hits Close
const label = document.createElement('label');
label.htmlFor = cb.id;
const urlShort = (t.url || '').length > 50 ? t.url.slice(0, 47) + '...' : (t.url || '(no url)');
label.textContent = `tab #${t.id}${urlShort}`;
const size = document.createElement('span');
size.className = 'mem-toast-size';
size.textContent = fmtBytesShort(tabRamScore(t));
row.appendChild(cb);
row.appendChild(label);
row.appendChild(size);
bodyEl.appendChild(row);
}
toast.style.display = '';
closeBtn.onclick = async () => {
const ids = tabsForClose
.filter((t) => document.getElementById(`mem-toast-tab-${t.id}`)?.checked)
.map((t) => t.id);
dismissMemToast();
for (const id of ids) {
try {
await fetch(`${serverUrl}/command`, {
method: 'POST',
headers: authHeaders(),
body: JSON.stringify({ command: 'closetab', args: [String(id)] }),
});
} catch (err) {
console.warn('[gstack sidebar] mem-toast closetab failed:', id, err && err.message);
}
}
};
}
/**
* Driven by every successful /memory poll. Decides whether to surface
* the toast and which payload to show.
*/
function evaluateMemToast(snapshot) {
if (!snapshot || !Array.isArray(snapshot.tabs)) return;
const tabs = snapshot.tabs;
// Trigger 1: any single tab over 4 GB JS heap. Catches the WebGL/video
// case before the tab count threshold ever fires.
const heavyTab = tabs.find((t) => (t.jsHeapUsed || 0) > HEAVY_TAB_HEAP_BYTES);
if (heavyTab && (heavyTab.jsHeapUsed || 0) > memToastSnooze.heapAbove) {
showMemToast(
`Heavy tab: ${fmtBytesShort(heavyTab.jsHeapUsed)} JS heap`,
'',
[heavyTab],
);
return;
}
// Trigger 2: tab count crossed the hard guardrail (200) and isn't snoozed.
if (tabs.length >= 200 && tabs.length > memToastSnooze.tabsAbove) {
const top5 = [...tabs].sort((a, b) => tabRamScore(b) - tabRamScore(a)).slice(0, 5);
showMemToast(
`${tabs.length} tabs open — close some?`,
'',
top5,
);
return;
}
// No trigger: keep toast hidden.
}
function setupMemToastWiring() {
const close = document.getElementById('mem-toast-close');
if (close) close.addEventListener('click', dismissMemToast);
const snooze = document.getElementById('mem-toast-snooze');
if (snooze) {
snooze.addEventListener('click', async () => {
// Snooze logic: bump the thresholds above the current snapshot so the
// toast won't re-fire until the user has accumulated MORE tabs or one
// tab has grown ANOTHER 2 GB beyond what we just warned about. Stored
// in chrome.storage.session so a sidebar reload doesn't lose the
// snooze (but a Chrome restart does).
try {
const resp = await fetch(`${serverUrl}/memory`, {
headers: { 'Authorization': `Bearer ${serverToken}` },
signal: AbortSignal.timeout(MEM_POLL_TIMEOUT_MS),
credentials: 'include',
});
if (resp.ok) {
const snap = await resp.json();
const tabs = Array.isArray(snap.tabs) ? snap.tabs : [];
memToastSnooze.tabsAbove = tabs.length + TOAST_SNOOZE_TAB_BUMP;
const maxHeap = tabs.reduce((m, t) => Math.max(m, t.jsHeapUsed || 0), 0);
memToastSnooze.heapAbove = maxHeap + TOAST_SNOOZE_HEAP_BUMP;
await saveSnoozeState();
}
} catch (err) {
console.debug('[gstack sidebar] mem-toast snooze fetch failed:', err && err.message);
}
dismissMemToast();
});
}
void loadSnoozeState();
}
// Wire the toast on DOM ready.
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', setupMemToastWiring);
} else {
setupMemToastWiring();
}
// ─── Refs Tab ───────────────────────────────────────────────────
async function fetchRefs() {
@@ -893,9 +1181,16 @@ function updateConnection(url, token) {
chrome.runtime.sendMessage({ type: 'sidebarOpened' }).catch(() => {});
connectSSE();
connectInspectorSSE();
startMemPolling();
} else {
document.getElementById('footer-dot').className = 'dot';
document.getElementById('footer-port').textContent = '';
const memEl = document.getElementById('footer-mem');
if (memEl) {
memEl.textContent = '';
memEl.classList.remove('warn', 'bad');
}
stopMemPolling();
setActionButtonsEnabled(false);
if (wasConnected) startReconnect();
}
+1
View File
@@ -141,6 +141,7 @@ Run with `browse <command> [args]`. Full reference: `browse/SKILL.md`.
- `disconnect`: Disconnect headed browser, return to headless mode
- `focus [@ref]`: Bring headed browser window to foreground (macOS)
- `handoff [message]`: Open visible Chrome at current page for user takeover
- `memory [--json]`: Snapshot Bun heap + per-tab JS heap + Chromium process tree + bounded buffer sizes.
- `restart`: Restart server
- `resume`: Re-snapshot after user takeover, return control to AI
- `state save|load <name>`: Save/load browser state (cookies + URLs)
+5 -1
View File
@@ -648,7 +648,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"health","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+7
View File
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Bash shim — Claude Code hooks run `command` strings via /bin/sh, so this
# wrapper makes the TypeScript hook executable via bun. Settings.json
# references this file directly.
set -e
HERE="$(cd "$(dirname "$0")" && pwd)"
exec bun "$HERE/question-log-hook.ts"
+289
View File
@@ -0,0 +1,289 @@
#!/usr/bin/env bun
/**
* PostToolUse hook for AskUserQuestion (Claude Code, plan-tune cathedral T5).
*
* Reads hook stdin JSON, extracts every AUQ question + user choice from the
* tool_input/tool_response, and writes them via gstack-question-log so the
* substrate captures fires deterministically no agent compliance required.
*
* Triggered by ~/.claude/settings.json:
* {
* "hooks": {
* "PostToolUse": [
* {
* "matcher": "(AskUserQuestion|mcp__.*__AskUserQuestion)",
* "hooks": [
* { "type": "command",
* "command": "$CLAUDE_PROJECT_DIR/.claude/skills/gstack/hosts/claude/hooks/question-log-hook",
* "timeout": 5 }
* ]
* }
* ]
* }
* }
*
* Invariants:
* - Always exits 0. A failing hook MUST NOT block the user's session.
* Errors land in ~/.gstack/hook-errors.log for postmortem.
* - Spawns gstack-question-log as a subprocess; that bin handles
* validation, dedup (source+tool_use_id), async derive.
* - Marker-first question_id (`<gstack-qid:foo-bar>`), hash fallback
* (D18 progressive markers).
*
* See docs/spikes/claude-code-hook-mutation.md for the protocol contract.
*/
import * as crypto from 'crypto';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { spawnSync } from 'child_process';
interface HookStdin {
session_id?: string;
hook_event_name?: string;
tool_name?: string;
tool_use_id?: string;
tool_input?: {
questions?: Array<{
question?: string;
options?: Array<string | { label?: string; description?: string }>;
multiSelect?: boolean;
}>;
};
tool_response?: unknown;
cwd?: string;
}
interface ExtractedQuestion {
question_id: string;
question_summary: string;
options_count: number;
user_choice: string;
recommended?: string;
free_text?: string;
category?: string;
door_type?: string;
}
const MARKER_RE = /<gstack-qid:([a-z0-9-]{1,64})>/i;
const RECOMMENDED_LABEL_RE = /\(recommended\)\s*$/i;
function logHookError(msg: string): void {
try {
const stateRoot =
process.env.GSTACK_STATE_ROOT ||
process.env.GSTACK_HOME ||
path.join(os.homedir(), '.gstack');
fs.mkdirSync(stateRoot, { recursive: true });
fs.appendFileSync(
path.join(stateRoot, 'hook-errors.log'),
`${new Date().toISOString()} question-log-hook: ${msg}\n`,
);
} catch {
// Last-resort: swallow. Hook must not block.
}
}
function readStdin(): Promise<string> {
return new Promise((resolve) => {
let buf = '';
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => (buf += chunk));
process.stdin.on('end', () => resolve(buf));
process.stdin.on('error', () => resolve(buf));
// Hard cutoff so we don't hang the user's session waiting for stdin.
setTimeout(() => resolve(buf), 2000);
});
}
function hashQuestionId(skill: string, question: string, options: string[]): string {
const sorted = [...options].sort().join('|');
const h = crypto
.createHash('sha1')
.update(`${skill}::${question}::${sorted}`)
.digest('hex');
return `hook-${h.slice(0, 10)}`;
}
/**
* Marker-first id extraction. Returns the marker id (stripped of the
* <gstack-qid:...> wrapper) when present, else a hash-based hook- id.
* Per D18 progressive markers hash ids are observed-only, never used
* as preference keys.
*/
function extractQuestionId(
skill: string,
questionText: string,
options: string[],
): { id: string; marker_present: boolean; stripped_question: string } {
const match = questionText.match(MARKER_RE);
if (match) {
return {
id: match[1],
marker_present: true,
stripped_question: questionText.replace(MARKER_RE, '').trim(),
};
}
return {
id: hashQuestionId(skill, questionText, options),
marker_present: false,
stripped_question: questionText,
};
}
function optionLabels(opts: Array<string | { label?: string; description?: string }>): string[] {
return opts.map((o) => (typeof o === 'string' ? o : o.label || o.description || ''));
}
/**
* Parse "(recommended)" label-first per D2; fall back to "Recommendation: X"
* prose match; refuse (return undefined) if ambiguous.
*/
function extractRecommended(questionText: string, opts: string[]): string | undefined {
const labelMatches = opts.filter((o) => RECOMMENDED_LABEL_RE.test(o));
if (labelMatches.length === 1) return labelMatches[0].replace(RECOMMENDED_LABEL_RE, '').trim();
if (labelMatches.length > 1) return undefined; // ambiguous
const m = questionText.match(/Recommendation:\s*([^\n]+)/i);
if (!m) return undefined;
const recPhrase = m[1].trim();
const matchByPrefix = opts.find((o) => o.toLowerCase().startsWith(recPhrase.toLowerCase().slice(0, 12)));
return matchByPrefix;
}
/**
* Best-effort extraction of which option the user picked per question.
* AUQ tool_response shape varies by Claude Code variant (native vs MCP),
* and the hook stdin docs don't pin a single canonical shape. We handle
* the common cases gracefully.
*/
function extractUserChoices(
response: unknown,
questionCount: number,
): Array<{ choice: string; free_text?: string }> {
const out: Array<{ choice: string; free_text?: string }> = [];
if (!response) {
for (let i = 0; i < questionCount; i++) out.push({ choice: '__unknown__' });
return out;
}
// Shape A: { answers: [{option_label, free_text?}] }
// Shape B: { questions: [{user_answer}] }
// Shape C: { content: [...] } or array.
// We probe lazily.
const rec = response as Record<string, unknown>;
if (Array.isArray(rec.answers)) {
for (const a of rec.answers as Array<Record<string, unknown>>) {
const choice = (a.option_label || a.label || a.choice || a.answer || '__unknown__') as string;
const freeText = (a.free_text || a.other_text) as string | undefined;
out.push(freeText ? { choice, free_text: freeText } : { choice });
}
while (out.length < questionCount) out.push({ choice: '__unknown__' });
return out;
}
if (Array.isArray(rec.questions)) {
for (const q of rec.questions as Array<Record<string, unknown>>) {
const choice = (q.user_answer || q.answer || q.choice || '__unknown__') as string;
out.push({ choice });
}
while (out.length < questionCount) out.push({ choice: '__unknown__' });
return out;
}
// Fall back: stringify and log first 100 chars to help future debugging.
for (let i = 0; i < questionCount; i++) {
out.push({ choice: `__response-shape-unknown:${JSON.stringify(response).slice(0, 80)}__` });
}
return out;
}
function detectSkill(cwd: string | undefined): string {
// Best-effort: cwd often contains the project slug but rarely the running
// skill. Without a session-state mechanism, leave as 'unknown' — the
// skill marker (<gstack-skill:NAME>) embedded in question text per
// future plan-tune work is the durable path.
void cwd;
return 'unknown';
}
function spawnLog(payload: Record<string, unknown>, cwd?: string): void {
// Locate the bin relative to this script's directory.
const here = path.dirname(new URL(import.meta.url).pathname);
// hosts/claude/hooks/ -> ../../../bin/
const repoRoot = path.resolve(here, '..', '..', '..');
const bin = path.join(repoRoot, 'bin', 'gstack-question-log');
const res = spawnSync(bin, [JSON.stringify(payload)], {
encoding: 'utf-8',
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 3000,
// Run from the originating tool call's cwd so gstack-slug resolves to
// the project the user is actually in, not the hook script's location.
cwd: cwd && fs.existsSync(cwd) ? cwd : undefined,
});
if (res.status !== 0) {
logHookError(`gstack-question-log exited ${res.status}: ${res.stderr || res.stdout}`);
}
}
async function main(): Promise<void> {
const raw = await readStdin();
if (!raw.trim()) {
process.exit(0);
}
let stdin: HookStdin;
try {
stdin = JSON.parse(raw);
} catch (e) {
logHookError(`stdin parse failed: ${(e as Error).message}`);
process.exit(0);
}
const toolName = stdin.tool_name || '';
if (
toolName !== 'AskUserQuestion' &&
!toolName.match(/^mcp__.+__AskUserQuestion$/)
) {
// Matcher should have filtered this out; defensive no-op.
process.exit(0);
}
const questions = stdin.tool_input?.questions || [];
if (questions.length === 0) {
process.exit(0);
}
const skill = detectSkill(stdin.cwd);
const choices = extractUserChoices(stdin.tool_response, questions.length);
for (let i = 0; i < questions.length; i++) {
const q = questions[i];
const qText = q.question || '';
if (!qText) continue;
const opts = optionLabels(q.options || []);
const { id, stripped_question } = extractQuestionId(skill, qText, opts);
const recommended = extractRecommended(stripped_question, opts);
const summary = stripped_question.slice(0, 200);
const choice = choices[i] || { choice: '__unknown__' };
const payload: Record<string, unknown> = {
skill,
question_id: id,
question_summary: summary,
options_count: opts.length,
user_choice: String(choice.choice).slice(0, 64),
source: choice.free_text ? 'auq-other' : 'hook',
session_id: stdin.session_id?.slice(0, 64),
tool_use_id: stdin.tool_use_id?.slice(0, 128),
};
if (recommended) payload.recommended = recommended.slice(0, 64);
if (choice.free_text) payload.free_text = String(choice.free_text);
spawnLog(payload, stdin.cwd);
}
process.exit(0);
}
main().catch((e) => {
logHookError(`main crash: ${(e as Error).message}`);
process.exit(0);
});
+7
View File
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Bash shim — Claude Code hooks run `command` strings via /bin/sh, so this
# wrapper makes the TypeScript hook executable via bun. Settings.json
# references this file directly.
set -e
HERE="$(cd "$(dirname "$0")" && pwd)"
exec bun "$HERE/question-preference-hook.ts"
@@ -0,0 +1,459 @@
#!/usr/bin/env bun
/**
* PreToolUse hook for AskUserQuestion (Claude Code, plan-tune cathedral T6).
*
* Enforces never-ask / always-ask / ask-only-for-one-way preferences
* deterministically no agent compliance required.
*
* Decision tree (per question in tool_input.questions):
* 1. Extract question_id via marker (<gstack-qid:foo-bar>). If no marker,
* enforcement is skipped for this question (D18 hash IDs are
* observed-only, never used as preference keys).
* 2. Look up door_type from scripts/question-registry.ts (default two-way).
* 3. Read preferences with precedence: project-local > global (D8).
* 4. Apply:
* never-ask + one-way defer (safety override; one-way always asks).
* never-ask + two-way + marker deny with auto-decided recommendation
* in reason. Mark tool_use_id so PostToolUse logs as 'auto-decided'.
* ask-only-for-one-way + two-way + marker same as never-ask.
* always-ask, or no preference defer.
*
* Why deny+reason instead of allow+updatedInput:
* AskUserQuestion's `updatedInput` shape for "pre-resolve this question"
* isn't structurally pinned in Claude Code docs (spike T4 left as open
* question). `deny` with a reason that names the auto-decided option is
* conservative + reliable: the model receives the rejection feedback,
* reads the recommended option from the reason, and proceeds without
* re-firing AUQ. When the spike around input mutation lands, we can
* swap to allow+updatedInput without changing the contract.
*
* Recommended-option extraction (per D2):
* - First: (recommended) label suffix on an option.
* - Fall back: "Recommendation: X" prose match against option labels.
* - Refuse to auto-decide if ambiguous (multiple labels OR no parseable
* recommendation): defer instead of silent-wrong.
*
* Always exits 0. Hook errors land in ~/.gstack/hook-errors.log.
* See docs/spikes/claude-code-hook-mutation.md for the protocol contract.
*/
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { spawnSync } from 'child_process';
interface HookStdin {
session_id?: string;
hook_event_name?: string;
tool_name?: string;
tool_use_id?: string;
tool_input?: {
questions?: Array<{
question?: string;
options?: Array<string | { label?: string; description?: string }>;
multiSelect?: boolean;
}>;
};
cwd?: string;
}
const MARKER_RE = /<gstack-qid:([a-z0-9-]{1,64})>/i;
const RECOMMENDED_LABEL_RE = /\(recommended\)\s*$/i;
function stateRoot(): string {
return (
process.env.GSTACK_STATE_ROOT ||
process.env.GSTACK_HOME ||
path.join(os.homedir(), '.gstack')
);
}
function logHookError(msg: string): void {
try {
const sr = stateRoot();
fs.mkdirSync(sr, { recursive: true });
fs.appendFileSync(
path.join(sr, 'hook-errors.log'),
`${new Date().toISOString()} question-preference-hook: ${msg}\n`,
);
} catch {
// last-resort swallow
}
}
function readStdin(): Promise<string> {
return new Promise((resolve) => {
let buf = '';
process.stdin.setEncoding('utf-8');
process.stdin.on('data', (chunk) => (buf += chunk));
process.stdin.on('end', () => resolve(buf));
process.stdin.on('error', () => resolve(buf));
setTimeout(() => resolve(buf), 2000);
});
}
function defer(additionalContext?: string): void {
const out: Record<string, unknown> = {
hookEventName: 'PreToolUse',
permissionDecision: 'defer',
};
if (additionalContext) out.additionalContext = additionalContext;
process.stdout.write(JSON.stringify({ hookSpecificOutput: out }));
process.exit(0);
}
function deny(reason: string): void {
process.stdout.write(
JSON.stringify({
hookSpecificOutput: {
hookEventName: 'PreToolUse',
permissionDecision: 'deny',
permissionDecisionReason: reason,
},
}),
);
process.exit(0);
}
function readJsonSafe(filePath: string): Record<string, unknown> | null {
try {
return JSON.parse(fs.readFileSync(filePath, 'utf-8'));
} catch {
return null;
}
}
interface PreferenceLookup {
preference: string | undefined;
source: 'project' | 'global' | 'none';
}
function lookupPreference(slug: string, questionId: string): PreferenceLookup {
const sr = stateRoot();
const projectFile = path.join(sr, 'projects', slug, 'question-preferences.json');
const globalFile = path.join(sr, 'global-question-preferences.json');
const project = readJsonSafe(projectFile);
if (project && typeof project[questionId] === 'string') {
return { preference: project[questionId] as string, source: 'project' };
}
const global = readJsonSafe(globalFile);
if (global && typeof global[questionId] === 'string') {
return { preference: global[questionId] as string, source: 'global' };
}
return { preference: undefined, source: 'none' };
}
interface RegistryEntry {
id: string;
door_type?: 'one-way' | 'two-way';
signal_key?: string;
}
interface MemoryNugget {
nugget: string;
applies_to_signal_keys: string[];
applied_at?: string;
}
/**
* Read per-session cache first, fall back to canonical local file. Cache
* invalidates by being missing gstack-distill-apply doesn't touch the
* cache because the canonical file is always the source-of-truth on read
* miss. Sub-1ms cache reads (D13 perf).
*/
function loadMemoryNuggets(sessionId: string | undefined): MemoryNugget[] {
const sr = stateRoot();
const canonical = path.join(sr, 'free-text-memory.json');
let nuggets: MemoryNugget[] | null = null;
if (sessionId) {
const cachePath = path.join(sr, 'sessions', sessionId, 'memory-cache.json');
try {
const cached = JSON.parse(fs.readFileSync(cachePath, 'utf-8'));
if (Array.isArray(cached.nuggets)) {
return cached.nuggets;
}
} catch {
// miss → fall through
}
}
try {
const j = JSON.parse(fs.readFileSync(canonical, 'utf-8'));
nuggets = Array.isArray(j.nuggets) ? j.nuggets : [];
} catch {
nuggets = [];
}
// Write through to the per-session cache so subsequent hooks on this
// session take the fast path. Best-effort; never fails the hook.
if (sessionId && nuggets) {
try {
const dir = path.join(sr, 'sessions', sessionId);
fs.mkdirSync(dir, { recursive: true });
fs.writeFileSync(
path.join(dir, 'memory-cache.json'),
JSON.stringify({ nuggets, cached_at: new Date().toISOString() }, null, 2),
);
} catch {
// swallow
}
}
return nuggets || [];
}
/**
* For a given signal_key, return up to N nuggets whose applies_to_signal_keys
* include it. Sorted by recency (most-recently-applied first), capped.
*/
function nuggetsForSignal(nuggets: MemoryNugget[], signalKey: string, max = 3): string[] {
return nuggets
.filter((n) => Array.isArray(n.applies_to_signal_keys) && n.applies_to_signal_keys.includes(signalKey))
.sort((a, b) => (b.applied_at || '').localeCompare(a.applied_at || ''))
.slice(0, max)
.map((n) => n.nugget);
}
let registryCache: Record<string, RegistryEntry> | null = null;
function loadRegistry(): Record<string, RegistryEntry> {
if (registryCache) return registryCache;
registryCache = {};
try {
// Hook lives at hosts/claude/hooks/; registry at scripts/question-registry.ts
const here = path.dirname(new URL(import.meta.url).pathname);
const repoRoot = path.resolve(here, '..', '..', '..');
const regPath = path.join(repoRoot, 'scripts', 'question-registry.ts');
if (!fs.existsSync(regPath)) return registryCache;
const src = fs.readFileSync(regPath, 'utf-8');
// Cheap regex extraction so the hook doesn't need to import the TS file
// (which would require bun resolving the module at hook-invocation time).
// Matches entries like:
// 'ship-test-failure-triage': {
// id: 'ship-test-failure-triage',
// ...
// door_type: 'one-way',
// signal_key: 'test-discipline',
// ...
// },
const blockRe =
/'([a-z0-9-]+)':\s*\{[^}]*?door_type:\s*'(one-way|two-way)'[^}]*?\}/g;
let m: RegExpExecArray | null;
while ((m = blockRe.exec(src))) {
const [block, id, door_type] = m;
const sk = block.match(/signal_key:\s*'([a-z0-9-]+)'/);
registryCache[id] = {
id,
door_type: door_type as 'one-way' | 'two-way',
signal_key: sk ? sk[1] : undefined,
};
}
} catch (e) {
logHookError(`registry load failed: ${(e as Error).message}`);
}
return registryCache;
}
function optionLabels(opts: Array<string | { label?: string; description?: string }>): string[] {
return opts.map((o) => (typeof o === 'string' ? o : o.label || o.description || ''));
}
function extractRecommended(
questionText: string,
opts: string[],
): { recommended: string | undefined; ambiguous: boolean } {
const labelMatches = opts.filter((o) => RECOMMENDED_LABEL_RE.test(o));
if (labelMatches.length === 1) {
return { recommended: labelMatches[0].replace(RECOMMENDED_LABEL_RE, '').trim(), ambiguous: false };
}
if (labelMatches.length > 1) return { recommended: undefined, ambiguous: true };
const m = questionText.match(/Recommendation:\s*([^\n]+)/i);
if (!m) return { recommended: undefined, ambiguous: false };
const recPhrase = m[1].trim();
const prefixMatches = opts.filter((o) =>
o.toLowerCase().startsWith(recPhrase.toLowerCase().slice(0, 12)),
);
if (prefixMatches.length === 1) return { recommended: prefixMatches[0], ambiguous: false };
if (prefixMatches.length > 1) return { recommended: undefined, ambiguous: true };
return { recommended: undefined, ambiguous: false };
}
function slugFromCwd(cwd: string | undefined): string {
// Mirror gstack-slug's basename fallback. The full slug resolver shells out
// to git, which is too expensive on a hot hook path; the basename is close
// enough for preference lookup (preferences are keyed by question_id, slug
// is just the directory bucket).
if (!cwd) return 'unknown';
return path.basename(cwd);
}
function markAutoDecided(sessionId: string | undefined, toolUseId: string | undefined): void {
if (!sessionId || !toolUseId) return;
try {
const sr = stateRoot();
const dir = path.join(sr, 'sessions', sessionId);
fs.mkdirSync(dir, { recursive: true });
fs.writeFileSync(path.join(dir, `.auto-decided-${toolUseId}`), '');
} catch (e) {
logHookError(`markAutoDecided failed: ${(e as Error).message}`);
}
}
/**
* Log an auto-decided event directly from PreToolUse, since `deny` prevents
* the tool from running and PostToolUse never fires. Without this, /plan-tune
* Recent auto-decisions would be blind to enforcement hits.
*/
function logAutoDecided(
questionId: string,
questionSummary: string,
recommended: string,
optionsCount: number,
sessionId: string | undefined,
toolUseId: string | undefined,
cwd: string | undefined,
): void {
try {
const here = path.dirname(new URL(import.meta.url).pathname);
const repoRoot = path.resolve(here, '..', '..', '..');
const bin = path.join(repoRoot, 'bin', 'gstack-question-log');
const payload: Record<string, unknown> = {
skill: 'unknown',
question_id: questionId,
question_summary: questionSummary.slice(0, 200),
options_count: optionsCount,
user_choice: recommended.slice(0, 64),
recommended: recommended.slice(0, 64),
source: 'auto-decided',
session_id: sessionId?.slice(0, 64),
tool_use_id: toolUseId?.slice(0, 128),
};
spawnSync(bin, [JSON.stringify(payload)], {
encoding: 'utf-8',
stdio: ['ignore', 'pipe', 'pipe'],
timeout: 3000,
// cwd of the originating tool call so gstack-slug resolves to the
// project the user is actually in, not the hook script's location.
cwd: cwd && fs.existsSync(cwd) ? cwd : undefined,
});
} catch (e) {
logHookError(`logAutoDecided failed: ${(e as Error).message}`);
}
}
async function main(): Promise<void> {
const raw = await readStdin();
if (!raw.trim()) {
defer();
return;
}
let stdin: HookStdin;
try {
stdin = JSON.parse(raw);
} catch (e) {
logHookError(`stdin parse failed: ${(e as Error).message}`);
defer();
return;
}
const toolName = stdin.tool_name || '';
if (
toolName !== 'AskUserQuestion' &&
!toolName.match(/^mcp__.+__AskUserQuestion$/)
) {
defer();
return;
}
const questions = stdin.tool_input?.questions || [];
if (questions.length === 0) {
defer();
return;
}
// For multi-question AUQ, enforcement is all-or-nothing per call:
// we deny only if ALL questions have marker + never-ask + safe door type.
// Mixed cases pass through (defer) so the user still gets to answer.
const registry = loadRegistry();
const slug = slugFromCwd(stdin.cwd);
const memoryNuggets = loadMemoryNuggets(stdin.session_id);
// Compute Layer 8 memory context inline: any nuggets matching the
// signal_keys of the questions in this AUQ get surfaced as additionalContext.
// This applies whether we defer OR deny — gives the agent + user the
// relevant prior context either way.
const contextNuggets: string[] = [];
for (const q of questions) {
const qText = q.question || '';
const marker = qText.match(MARKER_RE);
if (!marker) continue;
const entry = registry[marker[1]];
if (!entry?.signal_key) continue;
const hits = nuggetsForSignal(memoryNuggets, entry.signal_key);
for (const h of hits) {
if (!contextNuggets.includes(h)) contextNuggets.push(h);
}
}
const memoryContext = contextNuggets.length
? '[plan-tune memory] Past answers suggest: ' + contextNuggets.join(' | ')
: undefined;
const autoDecisions: Array<{ id: string; recommended: string }> = [];
for (const q of questions) {
const qText = q.question || '';
const marker = qText.match(MARKER_RE);
if (!marker) {
defer(memoryContext);
return;
}
const questionId = marker[1];
const pref = lookupPreference(slug, questionId);
if (!pref.preference || pref.preference === 'always-ask') {
defer(memoryContext);
return;
}
const entry = registry[questionId];
const doorType = entry?.door_type || 'two-way';
if (doorType === 'one-way') {
// Safety override — even never-ask doesn't bypass one-way doors.
defer(memoryContext);
return;
}
const opts = optionLabels(q.options || []);
const { recommended, ambiguous } = extractRecommended(qText, opts);
if (!recommended || ambiguous) {
// Refuse-on-ambiguous per D2 — fail safe, ask normally.
defer(memoryContext);
return;
}
autoDecisions.push({ id: questionId, recommended });
}
// All questions were eligible for enforcement.
markAutoDecided(stdin.session_id, stdin.tool_use_id);
// Log each auto-decided question now, since deny prevents PostToolUse from
// firing. /plan-tune Recent auto-decisions reads source=auto-decided events.
for (let i = 0; i < autoDecisions.length; i++) {
const d = autoDecisions[i];
const q = questions[i];
const qText = (q.question || '').replace(MARKER_RE, '').trim();
const opts = optionLabels(q.options || []);
logAutoDecided(d.id, qText, d.recommended, opts.length, stdin.session_id, stdin.tool_use_id, stdin.cwd);
}
const reasonLines = autoDecisions.map(
(d) =>
`[plan-tune auto-decide] ${d.id}${d.recommended} (your never-ask preference). Proceed with that option without re-prompting. Change with /plan-tune.`,
);
deny(reasonLines.join('\n'));
}
main().catch((e) => {
logHookError(`main crash: ${(e as Error).message}`);
defer();
});
+5 -1
View File
@@ -687,7 +687,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"investigate","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ios-clean","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -652,7 +652,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ios-design-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -653,7 +653,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ios-fix","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -656,7 +656,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ios-qa","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ios-sync","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -645,7 +645,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"land-and-deploy","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -646,7 +646,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"landing-report","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -648,7 +648,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"learn","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -683,7 +683,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"office-hours","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -645,7 +645,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"open-gstack-browser","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "gstack",
"version": "1.51.1.0",
"version": "1.52.1.0",
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
"license": "MIT",
"type": "module",
+5 -1
View File
@@ -647,7 +647,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"pair-agent","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -677,7 +677,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"plan-ceo-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -649,7 +649,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"plan-design-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -655,7 +655,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"plan-devex-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -653,7 +653,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"plan-eng-review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+305 -27
View File
@@ -658,7 +658,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"plan-tune","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -744,50 +748,87 @@ Canonical reference: `docs/designs/PLAN_TUNING_V0.md`.
## Step 0: Detect what the user wants
Read the user's message. Route based on plain-English intent, not keywords:
Read the user's message. Route based on plain-English intent, not keywords.
1. **First-time use** (config says `question_tuning` is not yet set to `true`) →
run `Enable + setup` below.
2. **"Show my profile" / "what do you know about me" / "show my vibe"** →
**Implicit gates run first** (before user-intent routing). These exist so first-time
users see the consent prompt, so explicit opt-ins eventually run the 5-Q setup,
and so accumulated free-text answers get dream-cycled into actionable proposals.
Each gate is guarded by a marker so the user is prompted at most once per choice.
1. **Consent gate.** If `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing → run `Consent + opt-in`
below. Honor the answer with a marker write either way; do not re-prompt.
2. **Setup gate.** If `question_tuning` is `true` AND
`~/.gstack/developer-profile.json`'s `declared` object is empty AND
`~/.gstack/.declared-setup-prompted` is missing → run `5-Q setup` below.
Touch the marker after setup completes OR is declined.
3. **Dream-cycle gate (Layer 8 / cathedral T10/T11).** If
`~/.gstack/projects/<slug>/distillation-proposals.json` exists AND has
`applied_at` missing on any proposal → run `Dream cycle review` below.
Marker: each proposal carries its own `applied_at` so re-firing this
gate naturally skips already-handled items.
When no implicit gate fires, route by user intent:
4. **"Show my profile" / "what do you know about me" / "show my vibe"** →
run `Inspect profile`.
3. **"Review questions" / "what have I been asked" / "show recent"** →
5. **"Review questions" / "what have I been asked" / "show recent"** →
run `Review question log`.
4. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
6. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
run `Set a preference`.
5. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
7. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
my mind"** → run `Edit declared profile` (confirm before writing).
6. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
7. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
8. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true`
9. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
8. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
9. **"Dream cycle" / "distill" / "what have I been free-texting"** →
run `Dream cycle distill` below (triggers `gstack-distill-free-text`).
10. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
11. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true && touch ~/.gstack/.question-tuning-prompted`
12. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, (e) run the dream cycle,
or (f) turn it off?"
Power-user shortcuts (one-word invocations) — handle these too:
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`.
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`,
`distill`, `dream`, `audit`.
---
## Enable + setup (first-time flow)
## Consent + opt-in
**When this fires.** The user invokes `/plan-tune` and the preamble shows
`QUESTION_TUNING: false` (the default).
**When this fires.** Step 0's consent gate: `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing. The user has never been
asked.
**Privacy note.** gstack defaults `question_tuning` to `false` for every user.
There is no auto-flip for any cohort. The consent prompt is the only path to
enabling, and the answer is honored with a marker file so the user is never
re-asked. Contributors are not auto-enrolled (see
`docs/designs/PLAN_TUNING_V1.md` §"Decisions log" for the privacy posture
rationale). If the user is a contributor (`gstack_contributor: true`), the
prompt can mention it as additional context, but the decision is still
explicit.
**Flow:**
1. Read the current state:
1. Detect contributor state (for prompt framing only, not for auto-action):
```bash
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QT"
echo "CONTRIBUTOR: $_CONTRIB"
```
2. If `false`, use AskUserQuestion:
2. AskUserQuestion (use the contributor-specific framing only if `_CONTRIB=true`,
otherwise use the general framing):
**General framing:**
> Question tuning is off. gstack can learn which of its prompts you find
> valuable vs noisy — so over time, gstack stops asking questions you've
> already answered the same way. It takes about 2 minutes to set up your
> initial profile. v1 is observational: gstack tracks your preferences
> and shows you a profile, but doesn't silently change skill behavior yet.
> Logs stay local (`~/.gstack/projects/<slug>/question-log.jsonl`).
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
@@ -795,13 +836,47 @@ Power-user shortcuts (one-word invocations) — handle these too:
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. If A or B: enable:
**Contributor framing (only if `_CONTRIB=true`):**
> You're a gstack contributor. Question tuning isn't on by default for
> anyone, but contributors are the cohort whose data most helps v2 work
> (skills adapting to your steering style). Enabling logs every
> AskUserQuestion outcome locally to
> `~/.gstack/projects/<slug>/question-log.jsonl` — nothing leaves your
> machine. v1 is observational only.
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
> A) Enable + set up (recommended for contributors, ~2 min)
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. ALWAYS touch the marker, regardless of choice:
```bash
touch ~/.gstack/.question-tuning-prompted
```
4. If A or B: enable:
```bash
~/.claude/skills/gstack/bin/gstack-config set question_tuning true
```
4. If A (full setup), ask FIVE one-per-dimension declaration questions via
individual AskUserQuestion calls (one at a time). Use plain English, no jargon:
5. If C: do nothing else. Tell the user: "Question tuning stays off. Re-enable
any time with `/plan-tune enable` or `gstack-config set question_tuning true`."
## 5-Q setup (post-consent, or via Setup gate)
**When this fires.** Two paths:
- Right after the consent prompt above accepts option A.
- Standalone via Step 0's setup gate: `question_tuning` is already `true`
(user opted in via gstack-config or earlier `/plan-tune enable`) AND
`declared` is empty AND `~/.gstack/.declared-setup-prompted` is missing.
This catches users who set `question_tuning: true` directly without
running the wizard.
**Flow:**
1. Ask FIVE one-per-dimension declaration questions via individual
AskUserQuestion calls (one at a time). Use plain English, no jargon:
**Q1 — scope_appetite:** "When you're planning a feature, do you lean toward
shipping the smallest useful version fast, or building the complete, edge-
@@ -854,10 +929,18 @@ Power-user shortcuts (one-word invocations) — handle these too:
"
```
5. Tell the user: "Profile set. Question tuning is now on. Use `/plan-tune`
2. Touch the marker so the Setup gate doesn't re-fire:
```bash
touch ~/.gstack/.declared-setup-prompted
```
Touch it even if the user bails out partway — they were asked; they chose
not to complete. The Setup gate respects that. They can rerun the 5-Q
anytime with `/plan-tune setup` (Step 0 power-user shortcut).
3. Tell the user: "Profile set. Question tuning is on. Use `/plan-tune`
again any time to inspect, adjust, or turn it off."
6. Show the profile inline as a confirmation (see `Inspect profile` below).
4. Show the profile inline as a confirmation (see `Inspect profile` below).
---
@@ -878,12 +961,18 @@ Parse the JSON. Present in **plain English**, not raw floats:
Format: "**scope_appetite:** 0.8 (boil the ocean — you prefer the complete
version with edge cases covered)"
- If `inferred.diversity` passes the calibration gate (`sample_size >= 20 AND
- If `inferred.diversity` passes the **display gate** (`sample_size >= 20 AND
skills_covered >= 3 AND question_ids_covered >= 8 AND days_span >= 7`), show
the inferred column next to declared:
"**scope_appetite:** declared 0.8 (boil the ocean) ↔ observed 0.72 (close)"
Use words for the gap: 0.0-0.1 "close", 0.1-0.3 "drift", 0.3+ "mismatch".
This display gate is intentionally lower than the E1 **promotion gate**
(90+ days stable across 3+ skills, per `docs/designs/PLAN_TUNING_V0.md`).
Displaying inferred values is a UI affordance; shipping behavior-adapting
defaults based on the profile is consequential and needs a much higher
bar. Do NOT use the display gate as a green light for v2 E1 work.
- If the calibration gate isn't met, say: "Not enough observed data yet —
need N more events across M more skills before we can show your observed
profile."
@@ -1031,12 +1120,37 @@ the user decides whether declared is wrong or behavior is wrong.
## Stats
Cathedral T13 surfaces: host-aware breakdown (claude hook vs codex import
vs agent-enriched), marked vs hash-only, auto-decided count, and dream
cycle cost-to-date.
```bash
~/.claude/skills/gstack/bin/gstack-question-preference --stats
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ -f "$_LOG" ] && echo "TOTAL_LOGGED: $(wc -l < "$_LOG" | tr -d ' ')" || echo "TOTAL_LOGGED: 0"
if [ -f "$_LOG" ]; then
bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const events = [];
for (const l of lines) { try { events.push(JSON.parse(l)); } catch {} }
const total = events.length;
const bySource = {};
let marked = 0;
for (const e of events) {
const src = e.source || 'agent';
bySource[src] = (bySource[src] || 0) + 1;
if (e.question_id && !e.question_id.startsWith('hook-')) marked++;
}
console.log('TOTAL_LOGGED: ' + total);
console.log('MARKED: ' + marked + ' (' + (total ? Math.round(100*marked/total) : 0) + '%)');
for (const s of Object.keys(bySource).sort()) {
console.log('SOURCE_' + s.toUpperCase().replace(/-/g,'_') + ': ' + bySource[s]);
}
"
else
echo 'TOTAL_LOGGED: 0'
fi
~/.claude/skills/gstack/bin/gstack-developer-profile --profile | bun -e "
const p = JSON.parse(await Bun.stdin.text());
const d = p.inferred?.diversity || {};
@@ -1045,10 +1159,174 @@ _LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
console.log('DAYS_SPAN: ' + (d.days_span ?? 0));
console.log('CALIBRATED: ' + (p.inferred?.sample_size >= 20 && d.skills_covered >= 3 && d.question_ids_covered >= 8 && d.days_span >= 7));
"
echo '---DISTILL---'
~/.claude/skills/gstack/bin/gstack-distill-free-text --status
```
Present as a compact summary with plain-English calibration status ("5 more
events across 2 more skills and you'll be calibrated" or "you're calibrated").
Surface the source breakdown so the user can see capture is real (Codex
correction — without source columns, the cathedral's "before:0 / after:>0"
claim is invisible).
---
## Recent auto-decisions
Show the last 10 questions where the PreToolUse hook auto-decided (source=
`auto-decided` in the log). Lets the user spot-check enforcement and flip
any that misfired via `always-ask`.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ ! -f "$_LOG" ] && echo 'NO_LOG' || bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const auto = [];
for (const l of lines) {
try { const e = JSON.parse(l); if (e.source === 'auto-decided') auto.push(e); } catch {}
}
const recent = auto.slice(-10).reverse();
if (!recent.length) { console.log('(no auto-decisions yet)'); process.exit(0); }
for (const r of recent) {
console.log(r.ts + ' ' + r.question_id + ' → ' + r.user_choice);
console.log(' ' + (r.question_summary || ''));
}
"
```
If any look wrong, offer: "Want to flip `<question_id>` to `always-ask`?"
Run `gstack-question-preference --write '{"question_id":"<id>","preference":
"always-ask","source":"plan-tune"}'` after Y.
---
## Audit unmarked questions
Top N hash-only question_ids by frequency. These are AUQ fires the cathedral
hook captured but cannot enforce against (no `<gstack-qid:foo>` marker in
the skill template — D18 progressive markers). Surfacing them drives marker
adoption: high-traffic unmarked questions are the next candidates to retrofit.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ ! -f "$_LOG" ] && echo 'NO_LOG' || bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const counts = {};
const summaries = {};
for (const l of lines) {
try {
const e = JSON.parse(l);
if (e.question_id && e.question_id.startsWith('hook-')) {
counts[e.question_id] = (counts[e.question_id] || 0) + 1;
summaries[e.question_id] = e.question_summary || '';
}
} catch {}
}
const rows = Object.entries(counts).sort((a,b) => b[1] - a[1]).slice(0, 10);
if (!rows.length) { console.log('(no unmarked questions — coverage is 100%)'); process.exit(0); }
for (const [id, n] of rows) {
console.log(n + 'x ' + id);
console.log(' ' + summaries[id]);
}
"
```
For each row, suggest where the marker should land (look up the skill from
the summary's wording, e.g. "Bundle this fix..." likely lives in
`ship/SKILL.md.tmpl`). Don't write markers without user approval — adding
markers changes which AUQ fires can be auto-decided, which is a substrate
expansion.
---
## Dream cycle review
**When this fires.** Step 0's dream-cycle gate: `distillation-proposals.json`
has at least one proposal with `applied_at` missing. Or the user explicitly
invokes via `/plan-tune distill` / `dream`.
**Flow:**
1. Show the proposals:
```bash
~/.claude/skills/gstack/bin/gstack-distill-apply --list
```
2. For each unapplied proposal, present it as a numbered item and use
AskUserQuestion (one per call, per skill convention). Show:
- Kind (`preference` / `declared-nudge` / `memory-nugget`)
- Confidence + rationale
- The source quotes verbatim (proves user-origin)
- What applying does (which file/key/dim changes)
3. **On accept** (Y): apply via the bin. The skill also publishes the
nugget to gbrain when configured.
For `memory-nugget`:
```bash
# If gbrain is configured, mirror via MCP first.
# (Pseudo — actual gbrain call happens at the agent layer via
# mcp__gbrain__put_page; the bin records the published flag.)
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N --gbrain-published true|false
```
For `preference`:
```bash
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N
```
For `declared-nudge`:
```bash
# Same bin; updates developer-profile.json declared dim with the
# clamped delta.
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N
```
4. **On decline**: skip without marking. User can re-decide later (the
proposal stays in the file). To dismiss permanently, manually clear:
`gstack-distill-apply --proposal N --dismiss` (not implemented in T11;
for now, regenerate via next distill run with corrected free-text).
5. **gbrain integration.** When `mcp__gbrain__*` tools are available in
this session:
- On `memory-nugget` apply: `mcp__gbrain__put_page` with the nugget +
`mcp__gbrain__extract_facts` + `mcp__gbrain__add_tag` per the cathedral
plan D9 routing. Then pass `--gbrain-published true` to the bin so
the proposals file records the mirror.
- When gbrain isn't configured (no MCP tools), the bin's local file
write is the durable source-of-truth and the PreToolUse hook reads it
via Layer 8 memory injection.
---
## Dream cycle distill (manual trigger)
**When this fires.** The user invokes `/plan-tune distill` / `dream` /
`distill` / `dream cycle`. Auto-triggered version lives in Step 0 gate #3.
**Flow:**
1. Run distill:
```bash
~/.claude/skills/gstack/bin/gstack-distill-free-text
```
2. If `RATE_CAPPED`: tell the user "You've hit today's 3 distills/day cap.
Run again tomorrow, or `/plan-tune stats` for run history."
3. If `NO_FREE_TEXT`: tell the user "No free-text answers since the last
distill. Keep using gstack — `Other` responses on AskUserQuestion feed
this loop."
4. If success: print the proposals count + estimated cost, then route into
`Dream cycle review` above for the user to approve each.
For background mode (e.g., the user wants to keep working):
```bash
~/.claude/skills/gstack/bin/gstack-distill-free-text --background
```
---
+300 -26
View File
@@ -52,50 +52,87 @@ Canonical reference: `docs/designs/PLAN_TUNING_V0.md`.
## Step 0: Detect what the user wants
Read the user's message. Route based on plain-English intent, not keywords:
Read the user's message. Route based on plain-English intent, not keywords.
1. **First-time use** (config says `question_tuning` is not yet set to `true`) →
run `Enable + setup` below.
2. **"Show my profile" / "what do you know about me" / "show my vibe"** →
**Implicit gates run first** (before user-intent routing). These exist so first-time
users see the consent prompt, so explicit opt-ins eventually run the 5-Q setup,
and so accumulated free-text answers get dream-cycled into actionable proposals.
Each gate is guarded by a marker so the user is prompted at most once per choice.
1. **Consent gate.** If `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing → run `Consent + opt-in`
below. Honor the answer with a marker write either way; do not re-prompt.
2. **Setup gate.** If `question_tuning` is `true` AND
`~/.gstack/developer-profile.json`'s `declared` object is empty AND
`~/.gstack/.declared-setup-prompted` is missing → run `5-Q setup` below.
Touch the marker after setup completes OR is declined.
3. **Dream-cycle gate (Layer 8 / cathedral T10/T11).** If
`~/.gstack/projects/<slug>/distillation-proposals.json` exists AND has
`applied_at` missing on any proposal → run `Dream cycle review` below.
Marker: each proposal carries its own `applied_at` so re-firing this
gate naturally skips already-handled items.
When no implicit gate fires, route by user intent:
4. **"Show my profile" / "what do you know about me" / "show my vibe"** →
run `Inspect profile`.
3. **"Review questions" / "what have I been asked" / "show recent"** →
5. **"Review questions" / "what have I been asked" / "show recent"** →
run `Review question log`.
4. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
6. **"Stop asking me about X" / "never ask about Y" / "tune: ..."** →
run `Set a preference`.
5. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
7. **"Update my profile" / "I'm more boil-the-ocean than that" / "I've changed
my mind"** → run `Edit declared profile` (confirm before writing).
6. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
7. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
8. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true`
9. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, or (e) turn it off?"
8. **"Show the gap" / "how far off is my profile"** → run `Show gap`.
9. **"Dream cycle" / "distill" / "what have I been free-texting"** →
run `Dream cycle distill` below (triggers `gstack-distill-free-text`).
10. **"Turn it off" / "disable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning false`
11. **"Turn it on" / "enable"** → `~/.claude/skills/gstack/bin/gstack-config set question_tuning true && touch ~/.gstack/.question-tuning-prompted`
12. **Clear ambiguity** — if you can't tell what the user wants, ask plainly:
"Do you want to (a) see your profile, (b) review recent questions, (c) set
a preference, (d) update your declared profile, (e) run the dream cycle,
or (f) turn it off?"
Power-user shortcuts (one-word invocations) — handle these too:
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`.
`profile`, `vibe`, `gap`, `stats`, `review`, `enable`, `disable`, `setup`,
`distill`, `dream`, `audit`.
---
## Enable + setup (first-time flow)
## Consent + opt-in
**When this fires.** The user invokes `/plan-tune` and the preamble shows
`QUESTION_TUNING: false` (the default).
**When this fires.** Step 0's consent gate: `question_tuning` is `false` AND
`~/.gstack/.question-tuning-prompted` is missing. The user has never been
asked.
**Privacy note.** gstack defaults `question_tuning` to `false` for every user.
There is no auto-flip for any cohort. The consent prompt is the only path to
enabling, and the answer is honored with a marker file so the user is never
re-asked. Contributors are not auto-enrolled (see
`docs/designs/PLAN_TUNING_V1.md` §"Decisions log" for the privacy posture
rationale). If the user is a contributor (`gstack_contributor: true`), the
prompt can mention it as additional context, but the decision is still
explicit.
**Flow:**
1. Read the current state:
1. Detect contributor state (for prompt framing only, not for auto-action):
```bash
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || echo "false")
echo "QUESTION_TUNING: $_QT"
echo "CONTRIBUTOR: $_CONTRIB"
```
2. If `false`, use AskUserQuestion:
2. AskUserQuestion (use the contributor-specific framing only if `_CONTRIB=true`,
otherwise use the general framing):
**General framing:**
> Question tuning is off. gstack can learn which of its prompts you find
> valuable vs noisy — so over time, gstack stops asking questions you've
> already answered the same way. It takes about 2 minutes to set up your
> initial profile. v1 is observational: gstack tracks your preferences
> and shows you a profile, but doesn't silently change skill behavior yet.
> Logs stay local (`~/.gstack/projects/<slug>/question-log.jsonl`).
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
@@ -103,13 +140,47 @@ Power-user shortcuts (one-word invocations) — handle these too:
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. If A or B: enable:
**Contributor framing (only if `_CONTRIB=true`):**
> You're a gstack contributor. Question tuning isn't on by default for
> anyone, but contributors are the cohort whose data most helps v2 work
> (skills adapting to your steering style). Enabling logs every
> AskUserQuestion outcome locally to
> `~/.gstack/projects/<slug>/question-log.jsonl` — nothing leaves your
> machine. v1 is observational only.
>
> RECOMMENDATION: Enable and set up your profile. Completeness: A=9/10.
>
> A) Enable + set up (recommended for contributors, ~2 min)
> B) Enable but skip setup (I'll fill it in later)
> C) Cancel — I'm not ready
3. ALWAYS touch the marker, regardless of choice:
```bash
touch ~/.gstack/.question-tuning-prompted
```
4. If A or B: enable:
```bash
~/.claude/skills/gstack/bin/gstack-config set question_tuning true
```
4. If A (full setup), ask FIVE one-per-dimension declaration questions via
individual AskUserQuestion calls (one at a time). Use plain English, no jargon:
5. If C: do nothing else. Tell the user: "Question tuning stays off. Re-enable
any time with `/plan-tune enable` or `gstack-config set question_tuning true`."
## 5-Q setup (post-consent, or via Setup gate)
**When this fires.** Two paths:
- Right after the consent prompt above accepts option A.
- Standalone via Step 0's setup gate: `question_tuning` is already `true`
(user opted in via gstack-config or earlier `/plan-tune enable`) AND
`declared` is empty AND `~/.gstack/.declared-setup-prompted` is missing.
This catches users who set `question_tuning: true` directly without
running the wizard.
**Flow:**
1. Ask FIVE one-per-dimension declaration questions via individual
AskUserQuestion calls (one at a time). Use plain English, no jargon:
**Q1 — scope_appetite:** "When you're planning a feature, do you lean toward
shipping the smallest useful version fast, or building the complete, edge-
@@ -162,10 +233,18 @@ Power-user shortcuts (one-word invocations) — handle these too:
"
```
5. Tell the user: "Profile set. Question tuning is now on. Use `/plan-tune`
2. Touch the marker so the Setup gate doesn't re-fire:
```bash
touch ~/.gstack/.declared-setup-prompted
```
Touch it even if the user bails out partway — they were asked; they chose
not to complete. The Setup gate respects that. They can rerun the 5-Q
anytime with `/plan-tune setup` (Step 0 power-user shortcut).
3. Tell the user: "Profile set. Question tuning is on. Use `/plan-tune`
again any time to inspect, adjust, or turn it off."
6. Show the profile inline as a confirmation (see `Inspect profile` below).
4. Show the profile inline as a confirmation (see `Inspect profile` below).
---
@@ -186,12 +265,18 @@ Parse the JSON. Present in **plain English**, not raw floats:
Format: "**scope_appetite:** 0.8 (boil the ocean — you prefer the complete
version with edge cases covered)"
- If `inferred.diversity` passes the calibration gate (`sample_size >= 20 AND
- If `inferred.diversity` passes the **display gate** (`sample_size >= 20 AND
skills_covered >= 3 AND question_ids_covered >= 8 AND days_span >= 7`), show
the inferred column next to declared:
"**scope_appetite:** declared 0.8 (boil the ocean) ↔ observed 0.72 (close)"
Use words for the gap: 0.0-0.1 "close", 0.1-0.3 "drift", 0.3+ "mismatch".
This display gate is intentionally lower than the E1 **promotion gate**
(90+ days stable across 3+ skills, per `docs/designs/PLAN_TUNING_V0.md`).
Displaying inferred values is a UI affordance; shipping behavior-adapting
defaults based on the profile is consequential and needs a much higher
bar. Do NOT use the display gate as a green light for v2 E1 work.
- If the calibration gate isn't met, say: "Not enough observed data yet —
need N more events across M more skills before we can show your observed
profile."
@@ -339,12 +424,37 @@ the user decides whether declared is wrong or behavior is wrong.
## Stats
Cathedral T13 surfaces: host-aware breakdown (claude hook vs codex import
vs agent-enriched), marked vs hash-only, auto-decided count, and dream
cycle cost-to-date.
```bash
~/.claude/skills/gstack/bin/gstack-question-preference --stats
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ -f "$_LOG" ] && echo "TOTAL_LOGGED: $(wc -l < "$_LOG" | tr -d ' ')" || echo "TOTAL_LOGGED: 0"
if [ -f "$_LOG" ]; then
bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const events = [];
for (const l of lines) { try { events.push(JSON.parse(l)); } catch {} }
const total = events.length;
const bySource = {};
let marked = 0;
for (const e of events) {
const src = e.source || 'agent';
bySource[src] = (bySource[src] || 0) + 1;
if (e.question_id && !e.question_id.startsWith('hook-')) marked++;
}
console.log('TOTAL_LOGGED: ' + total);
console.log('MARKED: ' + marked + ' (' + (total ? Math.round(100*marked/total) : 0) + '%)');
for (const s of Object.keys(bySource).sort()) {
console.log('SOURCE_' + s.toUpperCase().replace(/-/g,'_') + ': ' + bySource[s]);
}
"
else
echo 'TOTAL_LOGGED: 0'
fi
~/.claude/skills/gstack/bin/gstack-developer-profile --profile | bun -e "
const p = JSON.parse(await Bun.stdin.text());
const d = p.inferred?.diversity || {};
@@ -353,10 +463,174 @@ _LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
console.log('DAYS_SPAN: ' + (d.days_span ?? 0));
console.log('CALIBRATED: ' + (p.inferred?.sample_size >= 20 && d.skills_covered >= 3 && d.question_ids_covered >= 8 && d.days_span >= 7));
"
echo '---DISTILL---'
~/.claude/skills/gstack/bin/gstack-distill-free-text --status
```
Present as a compact summary with plain-English calibration status ("5 more
events across 2 more skills and you'll be calibrated" or "you're calibrated").
Surface the source breakdown so the user can see capture is real (Codex
correction — without source columns, the cathedral's "before:0 / after:>0"
claim is invisible).
---
## Recent auto-decisions
Show the last 10 questions where the PreToolUse hook auto-decided (source=
`auto-decided` in the log). Lets the user spot-check enforcement and flip
any that misfired via `always-ask`.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ ! -f "$_LOG" ] && echo 'NO_LOG' || bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const auto = [];
for (const l of lines) {
try { const e = JSON.parse(l); if (e.source === 'auto-decided') auto.push(e); } catch {}
}
const recent = auto.slice(-10).reverse();
if (!recent.length) { console.log('(no auto-decisions yet)'); process.exit(0); }
for (const r of recent) {
console.log(r.ts + ' ' + r.question_id + ' → ' + r.user_choice);
console.log(' ' + (r.question_summary || ''));
}
"
```
If any look wrong, offer: "Want to flip `<question_id>` to `always-ask`?"
Run `gstack-question-preference --write '{"question_id":"<id>","preference":
"always-ask","source":"plan-tune"}'` after Y.
---
## Audit unmarked questions
Top N hash-only question_ids by frequency. These are AUQ fires the cathedral
hook captured but cannot enforce against (no `<gstack-qid:foo>` marker in
the skill template — D18 progressive markers). Surfacing them drives marker
adoption: high-traffic unmarked questions are the next candidates to retrofit.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
eval "$(~/.claude/skills/gstack/bin/gstack-paths)"
_LOG="$GSTACK_STATE_ROOT/projects/$SLUG/question-log.jsonl"
[ ! -f "$_LOG" ] && echo 'NO_LOG' || bun -e "
const lines = require('fs').readFileSync('$_LOG','utf-8').trim().split('\n').filter(Boolean);
const counts = {};
const summaries = {};
for (const l of lines) {
try {
const e = JSON.parse(l);
if (e.question_id && e.question_id.startsWith('hook-')) {
counts[e.question_id] = (counts[e.question_id] || 0) + 1;
summaries[e.question_id] = e.question_summary || '';
}
} catch {}
}
const rows = Object.entries(counts).sort((a,b) => b[1] - a[1]).slice(0, 10);
if (!rows.length) { console.log('(no unmarked questions — coverage is 100%)'); process.exit(0); }
for (const [id, n] of rows) {
console.log(n + 'x ' + id);
console.log(' ' + summaries[id]);
}
"
```
For each row, suggest where the marker should land (look up the skill from
the summary's wording, e.g. "Bundle this fix..." likely lives in
`ship/SKILL.md.tmpl`). Don't write markers without user approval — adding
markers changes which AUQ fires can be auto-decided, which is a substrate
expansion.
---
## Dream cycle review
**When this fires.** Step 0's dream-cycle gate: `distillation-proposals.json`
has at least one proposal with `applied_at` missing. Or the user explicitly
invokes via `/plan-tune distill` / `dream`.
**Flow:**
1. Show the proposals:
```bash
~/.claude/skills/gstack/bin/gstack-distill-apply --list
```
2. For each unapplied proposal, present it as a numbered item and use
AskUserQuestion (one per call, per skill convention). Show:
- Kind (`preference` / `declared-nudge` / `memory-nugget`)
- Confidence + rationale
- The source quotes verbatim (proves user-origin)
- What applying does (which file/key/dim changes)
3. **On accept** (Y): apply via the bin. The skill also publishes the
nugget to gbrain when configured.
For `memory-nugget`:
```bash
# If gbrain is configured, mirror via MCP first.
# (Pseudo — actual gbrain call happens at the agent layer via
# mcp__gbrain__put_page; the bin records the published flag.)
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N --gbrain-published true|false
```
For `preference`:
```bash
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N
```
For `declared-nudge`:
```bash
# Same bin; updates developer-profile.json declared dim with the
# clamped delta.
~/.claude/skills/gstack/bin/gstack-distill-apply --proposal N
```
4. **On decline**: skip without marking. User can re-decide later (the
proposal stays in the file). To dismiss permanently, manually clear:
`gstack-distill-apply --proposal N --dismiss` (not implemented in T11;
for now, regenerate via next distill run with corrected free-text).
5. **gbrain integration.** When `mcp__gbrain__*` tools are available in
this session:
- On `memory-nugget` apply: `mcp__gbrain__put_page` with the nugget +
`mcp__gbrain__extract_facts` + `mcp__gbrain__add_tag` per the cathedral
plan D9 routing. Then pass `--gbrain-published true` to the bin so
the proposals file records the mirror.
- When gbrain isn't configured (no MCP tools), the bin's local file
write is the durable source-of-truth and the PreToolUse hook reads it
via Layer 8 memory injection.
---
## Dream cycle distill (manual trigger)
**When this fires.** The user invokes `/plan-tune distill` / `dream` /
`distill` / `dream cycle`. Auto-triggered version lives in Step 0 gate #3.
**Flow:**
1. Run distill:
```bash
~/.claude/skills/gstack/bin/gstack-distill-free-text
```
2. If `RATE_CAPPED`: tell the user "You've hit today's 3 distills/day cap.
Run again tomorrow, or `/plan-tune stats` for run history."
3. If `NO_FREE_TEXT`: tell the user "No free-text answers since the last
distill. Keep using gstack — `Other` responses on AskUserQuestion feed
this loop."
4. If success: print the proposals count + estimated cost, then route into
`Dream cycle review` above for the user to approve each.
For background mode (e.g., the user wants to keep working):
```bash
~/.claude/skills/gstack/bin/gstack-distill-free-text --background
```
---
+5 -1
View File
@@ -648,7 +648,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"qa-only","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -654,7 +654,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"qa","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -665,7 +665,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"retro","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"review","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -646,7 +646,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"scrape","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+125
View File
@@ -0,0 +1,125 @@
/**
* Declared-profile annotation helper (plan-tune cathedral T7).
*
* Given a kebab signal_key from scripts/question-registry.ts, returns a
* one-line plain-English annotation when the user's declared profile is in
* a strong band on the matching dimension, else null. Read-only never
* mutates the profile.
*
* Signature uses kebab signal_key per D2/Codex correction. Internally maps
* to the underscore Dimension key by consulting SIGNAL_MAP and picking the
* dimension this signal influences most strongly.
*
* Used by:
* - hosts/claude/hooks/question-preference-hook (Layer 3 injection path,
* when AUQ mutation lands)
* - scripts/resolvers/question-tuning.ts preamble (Layer 9 fallback,
* host-portable path on Codex / older Claude Code)
*
* NOT used for AUTO_DECIDE. Annotation is advisory only declared-only
* per TODOS.md E1 substrate-risk guidance. Inferred-driven AUTO_DECIDE
* remains v2.
*/
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { SIGNAL_MAP, type Dimension, ALL_DIMENSIONS } from './psychographic-signals';
const STRONG_HIGH = 0.7;
const STRONG_LOW = 0.3;
/**
* Plain-English phrasing per dimension + band. Keep one sentence each.
* Used directly in question prose, so phrasing matters.
*/
const DIMENSION_PHRASING: Record<Dimension, { high: string; low: string }> = {
scope_appetite: {
high: 'Your declared profile leans complete-implementation (boil the ocean).',
low: 'Your declared profile leans ship-small-fast.',
},
risk_tolerance: {
high: 'Your declared profile leans move-fast.',
low: 'Your declared profile leans check-carefully.',
},
detail_preference: {
high: 'Your declared profile leans verbose-with-tradeoffs.',
low: 'Your declared profile leans terse, just-do-it.',
},
autonomy: {
high: 'Your declared profile leans delegate-and-trust.',
low: 'Your declared profile leans consult-me-first.',
},
architecture_care: {
high: 'Your declared profile leans get-the-design-right.',
low: 'Your declared profile leans pragmatic-ship-it.',
},
};
interface DeveloperProfile {
declared?: Partial<Record<Dimension, number>>;
}
function stateRoot(): string {
return (
process.env.GSTACK_STATE_ROOT ||
process.env.GSTACK_HOME ||
path.join(os.homedir(), '.gstack')
);
}
function readProfile(): DeveloperProfile | null {
try {
const p = path.join(stateRoot(), 'developer-profile.json');
if (!fs.existsSync(p)) return null;
return JSON.parse(fs.readFileSync(p, 'utf-8'));
} catch {
return null;
}
}
/**
* Determine which dimension a signal_key influences most strongly.
* Sums |delta| across all user_choice DimensionDelta[] entries for that
* signal, returns the dimension with the largest total influence.
* Returns null if the signal_key isn't in the map.
*/
export function primaryDimensionFor(signalKey: string): Dimension | null {
const entry = SIGNAL_MAP[signalKey];
if (!entry) return null;
const totals: Partial<Record<Dimension, number>> = {};
for (const choice of Object.keys(entry)) {
for (const dd of entry[choice]) {
totals[dd.dim] = (totals[dd.dim] ?? 0) + Math.abs(dd.delta);
}
}
let best: Dimension | null = null;
let bestVal = -Infinity;
for (const d of ALL_DIMENSIONS) {
const v = totals[d] ?? 0;
if (v > bestVal) {
bestVal = v;
best = d;
}
}
return bestVal > 0 ? best : null;
}
/**
* Given a signal_key, return a one-line plain-English annotation when
* the user's declared profile is in a strong band on the primary dim,
* else null.
*/
export function getDeclaredAnnotation(signalKey: string): string | null {
if (!signalKey || typeof signalKey !== 'string') return null;
const dim = primaryDimensionFor(signalKey);
if (!dim) return null;
const profile = readProfile();
const declared = profile?.declared?.[dim];
if (typeof declared !== 'number') return null;
if (declared >= STRONG_HIGH) return DIMENSION_PHRASING[dim].high;
if (declared <= STRONG_LOW) return DIMENSION_PHRASING[dim].low;
return null;
}
+17
View File
@@ -187,6 +187,23 @@ export const SIGNAL_MAP: Record<string, Record<string, DimensionDelta[]>> = {
skip: [{ dim: 'architecture_care', delta: -0.04 }],
},
// -----------------------------------------------------------------------
// decision-autonomy — does the user trust the agent to apply decisions
// without checking back? (Cathedral T7: was the missing signal for the
// 'autonomy' dimension; added so /plan-tune annotations can render
// 'consult me' vs 'delegate' guidance on merge/rollback questions.)
// -----------------------------------------------------------------------
'decision-autonomy': {
accept: [{ dim: 'autonomy', delta: +0.04 }],
reject: [{ dim: 'autonomy', delta: -0.04 }],
// common option keys for "I'll review first" vs "go ahead":
'review-first': [{ dim: 'autonomy', delta: -0.05 }],
proceed: [{ dim: 'autonomy', delta: +0.05 }],
// /investigate-style: "agent applies fix" vs "show me the diff first"
'apply-fix': [{ dim: 'autonomy', delta: +0.04 }],
'show-diff': [{ dim: 'autonomy', delta: -0.04 }],
},
// -----------------------------------------------------------------------
// session-mode — office-hours goal selection
// -----------------------------------------------------------------------
+2
View File
@@ -455,6 +455,7 @@ export const QUESTIONS = {
category: 'approval',
door_type: 'one-way',
options: ['accept', 'reject'],
signal_key: 'decision-autonomy',
description: "Merge this PR to base branch?",
},
'land-and-deploy-rollback': {
@@ -463,6 +464,7 @@ export const QUESTIONS = {
category: 'approval',
door_type: 'one-way',
options: ['accept', 'reject'],
signal_key: 'decision-autonomy',
description: "Canary detected regressions — roll back the deploy?",
},
+5 -1
View File
@@ -25,7 +25,11 @@ export function generateQuestionTuning(ctx: TemplateContext): string {
Before each AskUserQuestion, choose \`question_id\` from \`scripts/question-registry.ts\` or \`{skill}-{slug}\`, then run \`${bin}/gstack-question-preference --check "<id>"\`. \`AUTO_DECIDE\` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." \`ASK_NORMALLY\` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append \`<gstack-qid:{question_id}>\` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered \`question_id\`.
**Embed the option recommendation via the \`(recommended)\` label suffix** on exactly one option per AUQ. The PreToolUse hook parses \`(recommended)\` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two \`(recommended)\` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
\`\`\`bash
${bin}/gstack-question-log '{"skill":"${ctx.skillName}","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
\`\`\`
+97
View File
@@ -1188,3 +1188,100 @@ if [ -x "$DETECT_BIN" ]; then
log " warning: gstack-gbrain-detect failed — brain-aware blocks will stay suppressed"
fi
fi
# 11. Plan-tune cathedral hook install (T8).
#
# Registers PostToolUse (deterministic AUQ capture) + PreToolUse (preference
# enforcement) hooks in ~/.claude/settings.json so /plan-tune actually does
# something at runtime instead of being agent-convention. Explicit consent UX
# per D4 + Codex: never mutate settings.json silently.
#
# Idempotent via _gstack_source tag = 'plan-tune-cathedral'. If both hooks
# already registered under that tag, the install is a no-op (no prompt).
PLAN_TUNE_LOG_HOOK="$SOURCE_GSTACK_DIR/hosts/claude/hooks/question-log-hook"
PLAN_TUNE_PREF_HOOK="$SOURCE_GSTACK_DIR/hosts/claude/hooks/question-preference-hook"
PLAN_TUNE_INSTALL_MARKER="$HOME/.gstack/.plan-tune-hooks-prompted"
if [ "$NO_TEAM_MODE" -ne 1 ] \
&& [ -x "$SETTINGS_HOOK" ] \
&& [ -x "$PLAN_TUNE_LOG_HOOK" ] \
&& [ -x "$PLAN_TUNE_PREF_HOOK" ]; then
# Already installed? Check the settings.json for our source tag.
ALREADY_INSTALLED=0
if "$SETTINGS_HOOK" list-sources 2>/dev/null | grep -q "plan-tune-cathedral"; then
ALREADY_INSTALLED=1
fi
if [ "$ALREADY_INSTALLED" -eq 1 ]; then
log ""
log "Plan-tune hooks already installed. Run \`$SETTINGS_HOOK list-sources\` to inspect."
elif [ -f "$PLAN_TUNE_INSTALL_MARKER" ]; then
# Previously declined. Don't re-ask. User can re-enable via /update-config.
:
elif [ -t 0 ] && [ -t 1 ]; then
# Interactive install with explicit consent + diff preview.
log ""
log "──────────────────────────────────────────────────────────"
log "Plan-tune cathedral: install Claude Code hooks?"
log "──────────────────────────────────────────────────────────"
log ""
log "These hooks make /plan-tune settings actually bind at runtime:"
log " • PostToolUse hook captures every AskUserQuestion fire (no agent"
log " compliance required). Today it's agent-convention and the log"
log " is empty in dogfood."
log " • PreToolUse hook enforces 'never-ask' preferences via Claude Code's"
log " permissionDecision protocol. Today preferences are agent-honored"
log " convention; this makes them binding."
log ""
log "Diff preview (PostToolUse capture hook):"
"$SETTINGS_HOOK" diff-event \
--event PostToolUse \
--matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \
--command "$PLAN_TUNE_LOG_HOOK" \
--source plan-tune-cathedral \
--timeout 5 2>/dev/null || true
log ""
log "Backup: settings.json.bak.<ts> written before any mutation."
log "Rollback: $SETTINGS_HOOK rollback"
log ""
printf "Install both hooks now? [y/N] "
read -r PLAN_TUNE_INSTALL_REPLY
if [ "$PLAN_TUNE_INSTALL_REPLY" = "y" ] || [ "$PLAN_TUNE_INSTALL_REPLY" = "Y" ]; then
"$SETTINGS_HOOK" add-event \
--event PostToolUse \
--matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \
--command "$PLAN_TUNE_LOG_HOOK" \
--source plan-tune-cathedral \
--timeout 5
"$SETTINGS_HOOK" add-event \
--event PreToolUse \
--matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \
--command "$PLAN_TUNE_PREF_HOOK" \
--source plan-tune-cathedral \
--timeout 5
log ""
log "Plan-tune hooks installed. Run /plan-tune anytime to inspect."
else
log ""
log "Skipped. Re-run ./setup or use /update-config to install later."
fi
touch "$PLAN_TUNE_INSTALL_MARKER"
else
# Non-interactive (CI, scripted setup). Don't prompt; print one-liner.
log ""
log "Plan-tune cathedral hooks not installed (non-interactive setup)."
log "Install with:"
log " $SETTINGS_HOOK add-event --event PostToolUse \\"
log " --matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \\"
log " --command $PLAN_TUNE_LOG_HOOK --source plan-tune-cathedral --timeout 5"
log " $SETTINGS_HOOK add-event --event PreToolUse \\"
log " --matcher '(AskUserQuestion|mcp__.*__AskUserQuestion)' \\"
log " --command $PLAN_TUNE_PREF_HOOK --source plan-tune-cathedral --timeout 5"
fi
fi
# Also tear down plan-tune hooks on --no-team (matches the existing pattern).
if [ "$NO_TEAM_MODE" -eq 1 ] && [ -x "$SETTINGS_HOOK" ]; then
"$SETTINGS_HOOK" remove-source --source plan-tune-cathedral 2>/dev/null || true
fi
+5 -1
View File
@@ -649,7 +649,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"setup-deploy","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -648,7 +648,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"setup-gbrain","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+28 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -3082,6 +3086,29 @@ This step is automatic — never skip it, never ask for confirmation.
---
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
per machine. Single line, non-blocking, marker-gated so it never re-fires.
```bash
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
echo ""
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
echo "auto-decides your never-ask preferences."
touch "$_NUDGE_MARKER"
fi
```
If the marker exists, OR question_tuning is already on, the nudge is a
no-op. The marker guarantees at-most-once per machine. To re-enable:
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.
+23
View File
@@ -975,6 +975,29 @@ This step is automatic — never skip it, never ask for confirmation.
---
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
per machine. Single line, non-blocking, marker-gated so it never re-fires.
```bash
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
echo ""
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
echo "auto-decides your never-ask preferences."
touch "$_NUDGE_MARKER"
fi
```
If the marker exists, OR question_tuning is already on, the nudge is a
no-op. The marker guarantees at-most-once per machine. To re-enable:
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.
+5 -1
View File
@@ -646,7 +646,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"skillify","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+10 -2
View File
@@ -647,7 +647,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"spec","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -1586,7 +1590,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"spec","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+5 -1
View File
@@ -648,7 +648,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"sync-gbrain","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
+129
View File
@@ -0,0 +1,129 @@
/**
* Declared annotation helper (plan-tune cathedral T7) unit tests.
*
* Verifies the helper's contract:
* - Returns null for unknown signal_key.
* - Returns null when the profile doesn't exist or declared is unset.
* - Returns a phrase when declared >= 0.7 (strong high band).
* - Returns a phrase when declared <= 0.3 (strong low band).
* - Returns null when declared is in the middle band (0.3 < x < 0.7).
* - primaryDimensionFor picks the dimension with largest |delta| total.
* - Maps kebab signal_key to underscore Dimension correctly (D2 fix).
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { getDeclaredAnnotation, primaryDimensionFor } from '../scripts/declared-annotation';
let prevStateRoot: string | undefined;
let prevHome: string | undefined;
let stateRoot: string;
beforeEach(() => {
stateRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-annot-'));
prevStateRoot = process.env.GSTACK_STATE_ROOT;
prevHome = process.env.GSTACK_HOME;
process.env.GSTACK_STATE_ROOT = stateRoot;
delete process.env.GSTACK_HOME;
});
afterEach(() => {
if (prevStateRoot !== undefined) process.env.GSTACK_STATE_ROOT = prevStateRoot;
else delete process.env.GSTACK_STATE_ROOT;
if (prevHome !== undefined) process.env.GSTACK_HOME = prevHome;
fs.rmSync(stateRoot, { recursive: true, force: true });
});
function writeProfile(declared: Record<string, number>): void {
const p = path.join(stateRoot, 'developer-profile.json');
fs.writeFileSync(p, JSON.stringify({ declared }, null, 2));
}
// ----------------------------------------------------------------------
// primaryDimensionFor — kebab→underscore mapping
// ----------------------------------------------------------------------
describe('primaryDimensionFor', () => {
test('scope-appetite → scope_appetite (largest |delta| total)', () => {
expect(primaryDimensionFor('scope-appetite')).toBe('scope_appetite');
});
test('architecture-care → architecture_care (top dim by |delta|)', () => {
expect(primaryDimensionFor('architecture-care')).toBe('architecture_care');
});
test('unknown signal_key → null', () => {
expect(primaryDimensionFor('totally-not-a-key')).toBe(null);
});
test('empty/garbage input → null', () => {
expect(primaryDimensionFor('')).toBe(null);
});
});
// ----------------------------------------------------------------------
// getDeclaredAnnotation
// ----------------------------------------------------------------------
describe('getDeclaredAnnotation', () => {
test('returns null when no profile exists', () => {
expect(getDeclaredAnnotation('scope-appetite')).toBe(null);
});
test('returns null when declared unset for the dimension', () => {
writeProfile({});
expect(getDeclaredAnnotation('scope-appetite')).toBe(null);
});
test('returns null when declared is in middle band (0.5)', () => {
writeProfile({ scope_appetite: 0.5 });
expect(getDeclaredAnnotation('scope-appetite')).toBe(null);
});
test('returns high-band phrase when declared >= 0.7', () => {
writeProfile({ scope_appetite: 0.85 });
const annot = getDeclaredAnnotation('scope-appetite');
expect(annot).toBeTruthy();
expect(annot).toContain('boil the ocean');
});
test('returns high-band phrase at the exact 0.7 threshold', () => {
writeProfile({ scope_appetite: 0.7 });
expect(getDeclaredAnnotation('scope-appetite')).toContain('boil the ocean');
});
test('returns low-band phrase when declared <= 0.3', () => {
writeProfile({ scope_appetite: 0.2 });
const annot = getDeclaredAnnotation('scope-appetite');
expect(annot).toBeTruthy();
expect(annot).toContain('ship-small-fast');
});
test('returns low-band phrase at the exact 0.3 threshold', () => {
writeProfile({ scope_appetite: 0.3 });
expect(getDeclaredAnnotation('scope-appetite')).toContain('ship-small-fast');
});
test('returns null for unknown signal_key even when profile populated', () => {
writeProfile({ scope_appetite: 0.85 });
expect(getDeclaredAnnotation('totally-not-a-key')).toBe(null);
});
test('all 5 dimensions render distinct high-band phrases', () => {
// Use the 5 signal_keys known to map to each of the 5 dimensions.
writeProfile({
scope_appetite: 0.9,
risk_tolerance: 0.9,
detail_preference: 0.9,
autonomy: 0.9,
architecture_care: 0.9,
});
const scope = getDeclaredAnnotation('scope-appetite');
const arch = getDeclaredAnnotation('architecture-care');
expect(scope).toContain('boil the ocean');
expect(arch).toContain('design-right');
});
});
+300
View File
@@ -0,0 +1,300 @@
/**
* gstack-distill-apply Layer 8 proposal application (plan-tune cathedral T11).
*
* Verifies the three apply paths:
* - memory-nugget appended to ~/.gstack/free-text-memory.json (local
* source-of-truth; gbrain is mirror when configured).
* - preference routed through gstack-question-preference with
* source=plan-tune (user-origin gate cleared).
* - declared-nudge atomic update to developer-profile.json declared dim,
* small=0.05, medium=0.10, large=0.15, clamped to [0,1].
* Plus:
* - --list shows proposals with kind, confidence, rationale, quotes.
* - Applied proposals get applied_at + gbrain_published flag.
* - Bad --proposal index errors with non-zero exit.
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { spawnSync } from 'child_process';
const ROOT = path.resolve(import.meta.dir, '..');
const BIN = path.join(ROOT, 'bin', 'gstack-distill-apply');
let stateRoot: string;
let fixtureCwd: string;
let cwdSlug: string;
let proposalFile: string;
beforeEach(() => {
stateRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-apply-'));
cwdSlug = 'apply-fixture';
fixtureCwd = path.join(stateRoot, cwdSlug);
fs.mkdirSync(fixtureCwd, { recursive: true });
fs.mkdirSync(path.join(stateRoot, 'projects', cwdSlug), { recursive: true });
proposalFile = path.join(stateRoot, 'projects', cwdSlug, 'distillation-proposals.json');
});
afterEach(() => {
fs.rmSync(stateRoot, { recursive: true, force: true });
});
function writeProposals(proposals: Array<Record<string, unknown>>): void {
fs.writeFileSync(
proposalFile,
JSON.stringify(
{ generated_at: new Date().toISOString(), source_event_count: 1, proposals },
null,
2,
),
);
}
function run(args: string[]): { stdout: string; stderr: string; status: number } {
const env: Record<string, string> = {};
for (const [k, v] of Object.entries(process.env)) {
if (v !== undefined) env[k] = v;
}
env.GSTACK_STATE_ROOT = stateRoot;
env.GSTACK_QUESTION_LOG_NO_DERIVE = '1';
delete env.GSTACK_HOME;
const res = spawnSync(BIN, args, { env, encoding: 'utf-8', cwd: fixtureCwd });
return {
stdout: res.stdout ?? '',
stderr: res.stderr ?? '',
status: res.status ?? -1,
};
}
// ----------------------------------------------------------------------
// --list
// ----------------------------------------------------------------------
describe('--list', () => {
test('handles missing proposals file', () => {
const r = run(['--list']);
expect(r.status).toBe(0);
expect(r.stdout).toMatch(/NO_PROPOSALS/);
});
test('renders all 3 kinds + source quotes', () => {
writeProposals([
{
kind: 'preference',
confidence: 0.9,
question_id: 'ship-changelog-voice-polish',
preference: 'never-ask',
rationale: 'user repeatedly skipped this',
source_quotes: ['skip the polish for typo PRs'],
},
{
kind: 'declared-nudge',
confidence: 0.85,
dimension: 'scope_appetite',
direction: 'up',
magnitude: 'medium',
},
{
kind: 'memory-nugget',
confidence: 0.95,
nugget: 'User prefers complete edge cases',
applies_to_signal_keys: ['scope-appetite'],
},
]);
const r = run(['--list']);
expect(r.status).toBe(0);
expect(r.stdout).toContain('preference');
expect(r.stdout).toContain('declared-nudge');
expect(r.stdout).toContain('memory-nugget');
expect(r.stdout).toContain('skip the polish for typo PRs');
expect(r.stdout).toContain('scope-appetite');
});
});
// ----------------------------------------------------------------------
// memory-nugget application
// ----------------------------------------------------------------------
describe('memory-nugget apply', () => {
test('appends to ~/.gstack/free-text-memory.json with full metadata', () => {
writeProposals([
{
kind: 'memory-nugget',
confidence: 0.9,
nugget: 'User prefers verbose explanations with tradeoffs',
applies_to_signal_keys: ['detail-preference'],
source_quotes: ['always explain the tradeoffs'],
},
]);
const r = run(['--proposal', '0', '--gbrain-published', 'true']);
expect(r.status).toBe(0);
expect(r.stdout).toContain('APPLIED: memory-nugget');
const memPath = path.join(stateRoot, 'free-text-memory.json');
const mem = JSON.parse(fs.readFileSync(memPath, 'utf-8'));
expect(mem.nuggets.length).toBe(1);
expect(mem.nuggets[0].nugget).toContain('verbose explanations');
expect(mem.nuggets[0].applies_to_signal_keys).toEqual(['detail-preference']);
expect(mem.nuggets[0].gbrain_published).toBe(true);
expect(mem.nuggets[0].source_quotes).toEqual(['always explain the tradeoffs']);
});
test('appends without clobbering existing nuggets', () => {
fs.writeFileSync(
path.join(stateRoot, 'free-text-memory.json'),
JSON.stringify({ nuggets: [{ nugget: 'pre-existing', applies_to_signal_keys: [] }] }),
);
writeProposals([
{
kind: 'memory-nugget',
confidence: 0.9,
nugget: 'new nugget',
applies_to_signal_keys: [],
},
]);
run(['--proposal', '0']);
const mem = JSON.parse(
fs.readFileSync(path.join(stateRoot, 'free-text-memory.json'), 'utf-8'),
);
expect(mem.nuggets.length).toBe(2);
expect(mem.nuggets[0].nugget).toBe('pre-existing');
expect(mem.nuggets[1].nugget).toBe('new nugget');
});
});
// ----------------------------------------------------------------------
// preference application
// ----------------------------------------------------------------------
describe('preference apply', () => {
test('routes through gstack-question-preference with source=plan-tune', () => {
writeProposals([
{
kind: 'preference',
confidence: 0.9,
question_id: 'ship-changelog-voice-polish',
preference: 'never-ask',
source_quotes: ['skip the polish for typo PRs'],
},
]);
const r = run(['--proposal', '0']);
expect(r.status).toBe(0);
expect(r.stdout).toContain('APPLIED: preference');
const prefPath = path.join(stateRoot, 'projects', cwdSlug, 'question-preferences.json');
const prefs = JSON.parse(fs.readFileSync(prefPath, 'utf-8'));
expect(prefs['ship-changelog-voice-polish']).toBe('never-ask');
});
});
// ----------------------------------------------------------------------
// declared-nudge application
// ----------------------------------------------------------------------
describe('declared-nudge apply', () => {
test('medium up nudge on unset dim → 0.5 + 0.10 = 0.6', () => {
writeProposals([
{
kind: 'declared-nudge',
confidence: 0.9,
dimension: 'scope_appetite',
direction: 'up',
magnitude: 'medium',
},
]);
run(['--proposal', '0']);
const profile = JSON.parse(
fs.readFileSync(path.join(stateRoot, 'developer-profile.json'), 'utf-8'),
);
expect(profile.declared.scope_appetite).toBe(0.6);
});
test('small down nudge on existing value', () => {
fs.writeFileSync(
path.join(stateRoot, 'developer-profile.json'),
JSON.stringify({ declared: { scope_appetite: 0.8 } }),
);
writeProposals([
{
kind: 'declared-nudge',
confidence: 0.9,
dimension: 'scope_appetite',
direction: 'down',
magnitude: 'small',
},
]);
run(['--proposal', '0']);
const profile = JSON.parse(
fs.readFileSync(path.join(stateRoot, 'developer-profile.json'), 'utf-8'),
);
expect(profile.declared.scope_appetite).toBe(0.75);
});
test('clamps to [0, 1]', () => {
fs.writeFileSync(
path.join(stateRoot, 'developer-profile.json'),
JSON.stringify({ declared: { scope_appetite: 0.95 } }),
);
writeProposals([
{
kind: 'declared-nudge',
confidence: 0.9,
dimension: 'scope_appetite',
direction: 'up',
magnitude: 'large',
},
]);
run(['--proposal', '0']);
const profile = JSON.parse(
fs.readFileSync(path.join(stateRoot, 'developer-profile.json'), 'utf-8'),
);
expect(profile.declared.scope_appetite).toBe(1);
});
});
// ----------------------------------------------------------------------
// Proposal marked applied
// ----------------------------------------------------------------------
describe('proposal marked applied', () => {
test('applied_at + gbrain_published written back to proposals.json', () => {
writeProposals([
{
kind: 'memory-nugget',
confidence: 0.9,
nugget: 'something',
applies_to_signal_keys: [],
},
]);
run(['--proposal', '0', '--gbrain-published', 'true']);
const p = JSON.parse(fs.readFileSync(proposalFile, 'utf-8'));
expect(p.proposals[0].applied_at).toBeTruthy();
expect(p.proposals[0].gbrain_published).toBe(true);
});
});
// ----------------------------------------------------------------------
// Error paths
// ----------------------------------------------------------------------
describe('error paths', () => {
test('bad --proposal index exits non-zero', () => {
writeProposals([
{ kind: 'memory-nugget', confidence: 0.9, nugget: 'x', applies_to_signal_keys: [] },
]);
const r = run(['--proposal', '99']);
expect(r.status).not.toBe(0);
expect(r.stderr).toContain('invalid --proposal');
});
test('missing --proposal exits non-zero', () => {
writeProposals([
{ kind: 'memory-nugget', confidence: 0.9, nugget: 'x', applies_to_signal_keys: [] },
]);
const r = run([]);
expect(r.status).not.toBe(0);
expect(r.stderr).toContain('--proposal');
});
});
+205
View File
@@ -0,0 +1,205 @@
/**
* gstack-distill-free-text Layer 8 dream cycle (plan-tune cathedral T10).
*
* Covers the SDK-free paths: status, dry-run, rate cap, no-event handling.
* The real API call path is exercised by the E2E test in T16; here we
* verify the bin's deterministic plumbing without burning tokens.
*/
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
import { spawnSync } from 'child_process';
const ROOT = path.resolve(import.meta.dir, '..');
const BIN = path.join(ROOT, 'bin', 'gstack-distill-free-text');
const QLOG_BIN = path.join(ROOT, 'bin', 'gstack-question-log');
let stateRoot: string;
let fixtureCwd: string;
let cwdSlug: string;
beforeEach(() => {
stateRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-dist-'));
cwdSlug = 'distill-fixture';
fixtureCwd = path.join(stateRoot, cwdSlug);
fs.mkdirSync(fixtureCwd, { recursive: true });
});
afterEach(() => {
fs.rmSync(stateRoot, { recursive: true, force: true });
});
function makeEnv(extra: Record<string, string> = {}): Record<string, string> {
const env: Record<string, string> = {};
for (const [k, v] of Object.entries(process.env)) {
if (v !== undefined) env[k] = v;
}
env.GSTACK_STATE_ROOT = stateRoot;
env.GSTACK_QUESTION_LOG_NO_DERIVE = '1';
delete env.GSTACK_HOME;
return { ...env, ...extra };
}
function run(args: string[]): { stdout: string; stderr: string; status: number } {
const res = spawnSync(BIN, args, {
env: makeEnv(),
encoding: 'utf-8',
cwd: fixtureCwd,
});
return {
stdout: res.stdout ?? '',
stderr: res.stderr ?? '',
status: res.status ?? -1,
};
}
function writeAuqOtherEvent(text: string): void {
spawnSync(
QLOG_BIN,
[
JSON.stringify({
skill: 'plan-tune',
question_id: 'hook-distill00',
question_summary: 'Test question for distillation',
options_count: 2,
user_choice: 'Other',
source: 'auq-other',
free_text: text,
session_id: 's-distill',
tool_use_id: 'tu-distill-' + Math.random().toString(36).slice(2, 8),
}),
],
{
env: makeEnv(),
cwd: fixtureCwd,
encoding: 'utf-8',
},
);
}
function writeCostLogEntry(slug: string, dateIso: string): void {
fs.mkdirSync(stateRoot, { recursive: true });
fs.appendFileSync(
path.join(stateRoot, 'distill-cost.jsonl'),
JSON.stringify({ ts: dateIso, slug, proposals_count: 0, cost_usd_est: 0 }) + '\n',
);
}
// ----------------------------------------------------------------------
// Status subcommand
// ----------------------------------------------------------------------
describe('--status', () => {
test('reports "no runs yet" when cost log absent', () => {
const r = run(['--status']);
expect(r.status).toBe(0);
expect(r.stdout).toMatch(/no distill runs/);
});
test('reports counts when prior runs exist', () => {
writeCostLogEntry(cwdSlug, new Date().toISOString());
writeCostLogEntry(cwdSlug, new Date().toISOString());
const r = run(['--status']);
expect(r.status).toBe(0);
expect(r.stdout).toContain('RUNS: 2');
expect(r.stdout).toMatch(/TODAY: 2 run\(s\)/);
});
});
// ----------------------------------------------------------------------
// No rate cap (v1.52.0.0 cap audit) — the natural rate of free-text events
// is rare enough that count-based capping was theatrical. Cost log alone
// provides auditability via --status.
// ----------------------------------------------------------------------
describe('no rate cap (audit removed)', () => {
test('never exits with RATE_CAPPED, even with many runs today', () => {
const today = new Date().toISOString();
for (let i = 0; i < 10; i++) writeCostLogEntry(cwdSlug, today);
const r = run([]);
expect(r.status).toBe(0);
expect(r.stdout).not.toMatch(/RATE_CAPPED/);
});
});
// ----------------------------------------------------------------------
// No events / no log
// ----------------------------------------------------------------------
describe('no-event paths', () => {
test('exits NO_LOG when question-log.jsonl missing', () => {
const r = run([]);
expect(r.status).toBe(0);
expect(r.stdout).toMatch(/NO_LOG/);
});
test('exits NO_FREE_TEXT when log has events but none are auq-other', () => {
spawnSync(
QLOG_BIN,
[
JSON.stringify({
skill: 'plan-tune',
question_id: 'hook-other00',
question_summary: 'Q',
options_count: 2,
user_choice: 'A',
source: 'hook',
session_id: 's',
tool_use_id: 'tu-x',
}),
],
{ env: makeEnv(), cwd: fixtureCwd, encoding: 'utf-8' },
);
const r = run([]);
expect(r.status).toBe(0);
expect(r.stdout).toMatch(/NO_FREE_TEXT/);
});
});
// ----------------------------------------------------------------------
// Dry-run
// ----------------------------------------------------------------------
describe('--dry-run', () => {
test('emits the distill prompt + events JSON without calling API', () => {
writeAuqOtherEvent('I always include tests with new features');
writeAuqOtherEvent('Skip design review for typo fixes');
// Strip ANTHROPIC_API_KEY to prove no API call happens.
const env = makeEnv();
delete env.ANTHROPIC_API_KEY;
const res = spawnSync(BIN, ['--dry-run'], { env, cwd: fixtureCwd, encoding: 'utf-8' });
expect(res.status).toBe(0);
expect(res.stdout).toContain('DISTILL PROMPT');
expect(res.stdout).toContain('always include tests');
});
});
// ----------------------------------------------------------------------
// API key required
// ----------------------------------------------------------------------
describe('API auth', () => {
test('fails loud when ANTHROPIC_API_KEY missing on sync run', () => {
writeAuqOtherEvent('Some free text response that needs distilling');
const env = makeEnv();
delete env.ANTHROPIC_API_KEY;
const res = spawnSync(BIN, [], { env, cwd: fixtureCwd, encoding: 'utf-8' });
expect(res.status).not.toBe(0);
expect(res.stderr).toMatch(/ANTHROPIC_API_KEY/);
expect(res.stderr).toMatch(/separate billing/);
});
});
// ----------------------------------------------------------------------
// Background spawn
// ----------------------------------------------------------------------
describe('--background', () => {
test('detaches and exits with DISTILL_SPAWNED', () => {
const r = run(['--background']);
expect(r.status).toBe(0);
expect(r.stdout).toMatch(/DISTILL_SPAWNED: pid=\d+/);
});
});
+28 -1
View File
@@ -650,7 +650,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `~/.claude/skills/gstack/bin/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
~/.claude/skills/gstack/bin/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -3082,6 +3086,29 @@ This step is automatic — never skip it, never ask for confirmation.
---
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
per machine. Single line, non-blocking, marker-gated so it never re-fires.
```bash
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
_QT=$(~/.claude/skills/gstack/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
echo ""
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
echo "auto-decides your never-ask preferences."
touch "$_NUDGE_MARKER"
fi
```
If the marker exists, OR question_tuning is already on, the nudge is a
no-op. The marker guarantees at-most-once per machine. To re-enable:
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.
+28 -1
View File
@@ -636,7 +636,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `$GSTACK_BIN/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
$GSTACK_BIN/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -2692,6 +2696,29 @@ This step is automatic — never skip it, never ask for confirmation.
---
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
per machine. Single line, non-blocking, marker-gated so it never re-fires.
```bash
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
_QT=$($GSTACK_ROOT/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
echo ""
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
echo "auto-decides your never-ask preferences."
touch "$_NUDGE_MARKER"
fi
```
If the marker exists, OR question_tuning is already on, the nudge is a
no-op. The marker guarantees at-most-once per machine. To re-enable:
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.
+28 -1
View File
@@ -638,7 +638,11 @@ If you are looping on the same diagnostic, same file, or failed fix variants, ST
Before each AskUserQuestion, choose `question_id` from `scripts/question-registry.ts` or `{skill}-{slug}`, then run `$GSTACK_BIN/gstack-question-preference --check "<id>"`. `AUTO_DECIDE` means choose the recommended option and say "Auto-decided [summary] → [option] (your preference). Change with /plan-tune." `ASK_NORMALLY` means ask.
After answer, log best-effort:
**Embed the question_id as a marker in the question text** so hooks can identify it deterministically (plan-tune cathedral T14 / D18 progressive markers). Append `<gstack-qid:{question_id}>` somewhere in the rendered question (the leading line or trailing line is fine; the marker doesn't render visibly to the user when wrapped in HTML-style angle brackets, but the hook strips it). Without the marker the PreToolUse enforcement hook treats the AUQ as observed-only and never auto-decides — so always include it when the question matches a registered `question_id`.
**Embed the option recommendation via the `(recommended)` label suffix** on exactly one option per AUQ. The PreToolUse hook parses `(recommended)` first, falls back to "Recommendation: X" prose, and refuses to auto-decide if ambiguous. Two `(recommended)` labels = refuse.
After answer, log best-effort (PostToolUse hook also captures deterministically when installed; dedup on (source, tool_use_id) handles double-writes):
```bash
$GSTACK_BIN/gstack-question-log '{"skill":"ship","question_id":"<id>","question_summary":"<short>","category":"<approval|clarification|routing|cherry-pick|feedback-loop>","door_type":"<one-way|two-way>","options_count":N,"user_choice":"<key>","recommended":"<key>","session_id":"'"$_SESSION_ID"'"}' 2>/dev/null || true
```
@@ -3070,6 +3074,29 @@ This step is automatic — never skip it, never ask for confirmation.
---
## Step 21: Plan-tune discoverability nudge (first-successful-ship only)
Plan-tune cathedral T15. After a successful ship, surface /plan-tune once
per machine. Single line, non-blocking, marker-gated so it never re-fires.
```bash
_NUDGE_MARKER="$HOME/.gstack/.plan-tune-nudge-shown"
_QT=$($GSTACK_ROOT/bin/gstack-config get question_tuning 2>/dev/null || echo "false")
if [ ! -f "$_NUDGE_MARKER" ] && [ "$_QT" = "false" ]; then
echo ""
echo "gstack can learn from your AskUserQuestion answers. Run /plan-tune to opt in"
echo "— it captures which prompts you find valuable vs noisy and (with hooks installed)"
echo "auto-decides your never-ask preferences."
touch "$_NUDGE_MARKER"
fi
```
If the marker exists, OR question_tuning is already on, the nudge is a
no-op. The marker guarantees at-most-once per machine. To re-enable:
`rm ~/.gstack/.plan-tune-nudge-shown` before next ship.
---
## Important Rules
- **Never skip tests.** If tests fail, stop.

Some files were not shown because too many files have changed in this diff Show More