mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-10 20:07:49 +02:00
v1.57.5.0 feat: cross-session decision memory + gbrain dream-stage call graph (#1910)
* feat(gbrain-sync): add cycleCompleted() cycle-state probe Reads `gbrain doctor` cycle_freshness to classify whether a source has completed a full cycle (completed/never/unknown). A fail naming this source -> never; a fail naming only other sources -> completed; an absent or unparseable check -> unknown, so an unrelated doctor failure never masks a real state. Gates the automatic call-graph build on --full. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): --dream call-graph stage with lock-free gate + honest outcome guard Adds a source-scoped `gbrain dream --source <id>` stage that builds this worktree's call graph (code-callers/code-callees). Runs lock-free after the sync lock releases so it never blocks sibling worktrees; a .dream-in-progress marker dedupes concurrent dreams. --full auto-runs it only when the cycle was never built; explicit --dream always forces; --no-dream opts out. The stage parses the cycle's own output and reports the truth, not a flat "built": a WARN when the schema pack can't extract code symbols, when the embed phase failed for a missing key, or when 0 edges resolved; OK with the resolved-edge count otherwise. gbrain exits 0 even when it skips on a held cycle lock (e.g. autopilot), so that case reports SKIP, not success. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: ignore gbrain .sources/ local staging dir gbrain writes per-source staging and capability-check artifacts under .sources/ in the repo root. It's machine-local runtime state, not source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(gbrain): honest call-graph guidance in /sync-gbrain + pin works on gbrain>=0.41.38 sync-gbrain frames the --dream offer honestly: building a call graph requires a code-aware schema pack, and the dream stage reports a WARN when it can't. The verdict's Call graph row mirrors the dream stage's real outcome instead of assuming a completed cycle means edges exist. The ## GBrain Search Guidance block written into CLAUDE.md drops the old code-callers --source caveat: gbrain >=0.41.38.0 honors the .gbrain-source pin for code-callers/code-callees. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(jsonl-store): shared audited JSONL plumbing (injection-reject + atomic append + tolerant read) Single source of truth extracted for D2A: gstack-learnings-* and the upcoming gstack-decision-* bins share one injection-pattern list, one atomic single-line appender, and one tolerant reader. No more drift between stores. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(learnings-log): use shared hasInjection from lib/jsonl-store (D2A) Replace the inline injection-pattern copy with the shared list. One audited write-path rejection across learnings + the upcoming decision store. Behavior unchanged (35/35 learnings tests green); learnings-search keeps its inline copy because a structural test pins its bash/bun shape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): event-sourced decision-memory model (lib/gstack-decision) decide/supersede/redact events on lib/jsonl-store; active set is computed (no mutable status), dangling refs tolerated. Free-text is injection-checked and redact-scanned on write (HIGH secret -> reject). Scope filter (repo/branch/issue) for relevant resurfacing. File-only + reliable; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): bounded active snapshot + compaction (redact expunges, supersede archives) writeSnapshot/readSnapshot/rebuildSnapshot give an O(active) bounded read for the session-start hot path (D1A). compact() rewrites the log to active, archives superseded decisions for history, and EXPUNGES redacted ones (dropped, never archived) so an accidentally-captured secret leaves the store for good. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(decision): gstack-decision-log + gstack-decision-search bins (non-interactive) Two bins mirroring gstack-learnings-* (D3A). log writes decide/--supersede/--redact/ --compact events + refreshes the bounded snapshot + enqueues for cross-machine sync; search reads the O(active) snapshot, scope-filtered to current branch, newest-first, --all to include superseded, --json for machines. Empty store returns silently (no snapshot write on an empty read). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): surface active decisions at session start + capture nudge (Context Recovery) Context Recovery now shows recent scope-relevant active decisions (bounded read of decisions.active.json via gstack-decision-search) and instructs the agent to treat them as settled calls and to log durable decisions/reversals. Closes the Phase-1 capture->curate->resurface loop, reliable + file-only. Regen across all hosts folded in (squash-with-regen); parity 10/10, freshness green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: refresh ship golden baselines for the memory-loop preamble change Context Recovery now emits the cross-session-decisions block, so ship's preamble (all hosts) changed. Golden baselines are hand-maintained copies (gen does not write them); refresh them from the fresh gen so golden-file regression passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(memory): document the cross-session decision-memory loop in CLAUDE.md Adds a '## Cross-session decision memory' section: how to resurface (gstack-decision-search) and capture (gstack-decision-log) durable decisions, the supersede/redact/compact verbs, and a crisp durable-vs-trivial definition so the store stays signal. Reliable file-only path; gbrain not required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): emit durable decisions from ship/ceo/eng/spec at structured points Wires the four skills that finalize real decisions to capture them in the cross-session decision store, from their STRUCTURED outputs (never free-text scraping): - ship: the version bump (level + why) at write time - plan-ceo-review: accepted scope + verdict (branch-scoped) - plan-eng-review: the architecture verdict + key call (branch-scoped) - spec: the filed issue's core approach (issue-scoped) All emits are non-interactive, schema-correct (content in decision/rationale, source=skill, confidence 1-10), and best-effort (|| true) so a decision-log failure never blocks the workflow. Includes regen across hosts + refreshed ship golden baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(memory): optional gbrain --semantic recall for decision search Adds gstack-decision-search --semantic (with --query): appends a 'Related from memory' block from gbrain semantic search, scoped to the curated-memory source. Pure enhancement, reliability-first: a new lib/gstack-decision-semantic.ts is the ONLY decision module that touches gbrain and is imported lazily only on --semantic, so the reliable file path never loads gbrain code. Every path degrades to the reliable file results when gbrain is off, unconfigured, empty, or errors (never throws, 10s timeout). Built against the verified gbrain 0.42.x surface (text output [score] slug -- snippet, NOT JSON; curated-memory source resolved by worktree path, not a gstack-brain-<user> id). Deterministic-contract tests only: parser units, degrade-to-null when gbrain absent, and a fake-gbrain shim proving scope+search end-to-end. find-contradictions deferred (no verifiable CLI surface yet + curated memory not indexed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(gbrain-sync): self-heal stale autopilot lock (dead-pid) detectAutopilot treated a lock FILE as proof of life, so a crashed gbrain daemon left a stale lock that wedged every sync forever (observed: a dead pid refused --full indefinitely). Now read the holder pid (bare or JSON body) and check liveness via signal-0: ESRCH=dead → ignore the stale signal and keep checking; EPERM=alive (other user) → active. A stale lock never masks a live autopilot process. Pure decision function — does not delete the file; the caller may clean it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(review): drop stray trailing code fence in TODOS-format Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): align section-loading E2E testNames with their TOUCHFILES keys Pre-existing on main (v1.56.x): the two section-loading E2E tests used human-label testNames ('/ship section-loading') that don't match their slug keys ('ship-section-loading') in E2E_TOUCHFILES/E2E_TIERS. Every other E2E test uses the slug as its testName, and the TOUCHFILES completeness gate requires testName to be a registered key — so the gate was red. Align both testNames to their slug keys (also fixes tier lookup for these two periodic tests). Verified failing on a clean origin/main checkout before the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix: pre-landing review fixes (datamark, DRY, compact, coverage) Addresses the pre-landing review findings (all INFORMATIONAL, no criticals): - security: datamark resurfaced decision text at the render boundary (lib/gstack-decision.ts datamark() — neutralizes code fences, --- banners, <|role|>/</system> markers, control chars, newlines). Applied in gstack-decision-search human output so stored text can't masquerade as instructions in Context Recovery (codex hardening #3 / AC #7). --json stays raw. - DRY: extract resolveSlug/gitBranch/flagValue to lib/bin-context.ts; both decision bins use it instead of duplicating the helpers. - compact(): batch the archive append (one write, not N) and shrink the mid-compact crash window; simplify the opaque branch/issue ternary. - coverage: learnings-log injection rejection (D2A wiring), search --recent/ --scope + NaN-safe --recent, datamark-applied, unparseable lock body, compact-empty, corrupt-snapshot degrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close adversarial-review findings in decision memory Adversarial review (Claude subagent) found a CRITICAL the specialist pass missed: - F1 (CRITICAL): 'Human:'/'Assistant:' turn-prefixes bypassed BOTH the write-time denylist AND datamark(), landing verbatim in agent context inside the trusted ACTIVE DECISIONS fence. Add 'human:' (+ 'disregard previous', 'from now on') to the shared denylist, and have datamark() neutralize Human:/Assistant:/System:/User: turn-prefixes (ZWSP) at the render boundary. - F2: datamark() only stripped ASCII C0; extend to Unicode line terminators (U+0085/2028/2029) and U+007F so 'strip newlines' actually holds. - F3: validateDecide blocked only HIGH secrets; MEDIUM-tier PII (e.g. SSN) persisted silently and synced cross-machine. The store is non-interactive (no confirm path), so fail closed on MEDIUM too. - F4: compact() was a lock-free read-modify-rewrite that could clobber a concurrent append (lost decision). Add an O_EXCL compact lock + a pre-rename size recheck that aborts untouched (skipped=true) if an append landed; caller re-runs. - F7: filterByScope unknown/garbage scope fell through to 'return true' (leaked into every context); fail conservative (false). F5 (pid reuse) and F6 (pgrep over-match) are intentionally left as-is: both fail SAFE (over-refuse sync); making them precise would introduce a fail-DANGEROUS path (allowing sync during a real autopilot). True disambiguation needs gbrain to stamp the lock with a start-time, which gstack doesn't own. F8 (compact moves history to archive) is by design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): close cross-model (Codex) adversarial findings Codex adversarial review found a HIGH the Claude pass missed plus 3 mediums: - C1 (HIGH): gstack-decision-search --all returned every decide and IGNORED redact events, so a redacted secret still resurfaced via --all until compact ran. --all now excludes redacted (redact = expunge from every read path), still showing superseded history. - C-med: semantic (external gbrain) slug/snippet were printed raw — datamark them too so a gbrain hit can't spoof role markers / fences into agent context. - C4: semanticRecall fell back to an UNSCOPED gbrain search when no curated-memory source resolved, pulling code/doc corpora mislabeled as 'related decisions'. Now returns null (degrade) when there's no worktree-backed memory source. - C5: validateDecide scanned only decision/rationale/alternatives; branch and issue are stored + surfaced (raw via --json), so include them in the injection+secret scan. C2 (snapshot staleness) / C3 (compact TOCTOU residual): accepted for a single-user store — atomic appends never lose the event, rebuilds self-heal, and the compact size-recheck leaves only a sub-ms window; full append-locking would break the lock-free append design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.57.5.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,28 @@
|
||||
/**
|
||||
* bin-context — tiny shared helpers for non-interactive gstack bins that need the
|
||||
* project slug, current branch, and argv flags. Extracted from the decision bins
|
||||
* (gstack-decision-log / gstack-decision-search) so the slug/branch/flag plumbing
|
||||
* lives in one audited place instead of being copy-pasted per bin.
|
||||
*/
|
||||
|
||||
import { spawnSync } from "child_process";
|
||||
|
||||
/** Resolve the project slug via the `gstack-slug` helper (parses `SLUG=...`). */
|
||||
export function resolveSlug(slugBinPath: string): string {
|
||||
const r = spawnSync(slugBinPath, { encoding: "utf-8" });
|
||||
const m = (r.stdout || "").match(/^SLUG=(.+)$/m);
|
||||
return m ? m[1].trim() : "unknown";
|
||||
}
|
||||
|
||||
/** Current git branch, or undefined on detached HEAD / outside a repo. */
|
||||
export function gitBranch(): string | undefined {
|
||||
const r = spawnSync("git", ["rev-parse", "--abbrev-ref", "HEAD"], { encoding: "utf-8" });
|
||||
const b = (r.stdout || "").trim();
|
||||
return b && b !== "HEAD" ? b : undefined;
|
||||
}
|
||||
|
||||
/** The value following `--flag` in argv, or undefined if absent. */
|
||||
export function flagValue(args: string[], name: string): string | undefined {
|
||||
const i = args.indexOf(name);
|
||||
return i >= 0 ? args[i + 1] : undefined;
|
||||
}
|
||||
+43
-2
@@ -29,7 +29,7 @@
|
||||
*/
|
||||
|
||||
import { spawnSync } from "child_process";
|
||||
import { existsSync, realpathSync } from "fs";
|
||||
import { existsSync, realpathSync, readFileSync } from "fs";
|
||||
import { homedir } from "os";
|
||||
import { join, resolve, sep } from "path";
|
||||
import { execGbrainJson, execGbrainText, NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
@@ -92,7 +92,20 @@ export function detectAutopilot(
|
||||
join(homedir(), ".gbrain", "autopilot.pid"),
|
||||
];
|
||||
for (const lp of lockPaths) {
|
||||
if (existsSync(lp)) return { active: true, signal: `lock:${lp}` };
|
||||
if (!existsSync(lp)) continue;
|
||||
// A lock FILE alone is not proof of life — a crashed daemon leaves a stale
|
||||
// lock that would otherwise wedge every sync forever (observed: a dead pid
|
||||
// refused --full indefinitely). Read the holder pid and check liveness.
|
||||
const pid = readLockPid(lp);
|
||||
if (pid === null) {
|
||||
// Can't introspect (no parseable pid) → stay conservative: treat as active.
|
||||
return { active: true, signal: `lock:${lp}` };
|
||||
}
|
||||
if (isPidAlive(pid)) {
|
||||
return { active: true, signal: `lock:${lp} (pid ${pid})` };
|
||||
}
|
||||
// Stale lock (holder pid is dead): ignore this signal, keep checking. Pure
|
||||
// decision function — we do NOT delete the file here; the caller may clean it.
|
||||
}
|
||||
// Primary signal: a live `gbrain autopilot` process.
|
||||
const running = (probe.processRunning ?? defaultProcessRunning)();
|
||||
@@ -100,6 +113,34 @@ export function detectAutopilot(
|
||||
return { active: false, signal: null };
|
||||
}
|
||||
|
||||
/** Read the holder pid from a lock/pid file. Returns null if no integer pid is present. */
|
||||
function readLockPid(lockPath: string): number | null {
|
||||
try {
|
||||
const raw = readFileSync(lockPath, "utf-8").trim();
|
||||
// Files seen: a bare pid ("65495"), or JSON like {"pid":65495,...}.
|
||||
const m = raw.match(/"pid"\s*:\s*(\d+)/) ?? raw.match(/^(\d+)$/);
|
||||
if (!m) return null;
|
||||
const pid = Number.parseInt(m[1], 10);
|
||||
return Number.isFinite(pid) && pid > 0 ? pid : null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Liveness via signal 0: no signal sent, just an existence/permission check.
|
||||
* ESRCH → dead; EPERM → alive but owned by another user. Cross-host pids are
|
||||
* meaningless, but the autopilot lock is same-host by construction.
|
||||
*/
|
||||
function isPidAlive(pid: number): boolean {
|
||||
try {
|
||||
process.kill(pid, 0);
|
||||
return true;
|
||||
} catch (err) {
|
||||
return (err as NodeJS.ErrnoException).code === "EPERM";
|
||||
}
|
||||
}
|
||||
|
||||
function defaultProcessRunning(): boolean {
|
||||
// No reliable pgrep on Windows; rely on the lock-file signal there.
|
||||
if (process.platform === "win32") return false;
|
||||
|
||||
+58
-1
@@ -11,7 +11,7 @@
|
||||
|
||||
import { execFileSync, spawnSync } from "child_process";
|
||||
import { withErrorContext } from "./gstack-memory-helpers";
|
||||
import { NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
import { execGbrainJson, NEEDS_SHELL_ON_WINDOWS } from "./gbrain-exec";
|
||||
|
||||
export interface SourceState {
|
||||
/** "absent" — id not registered. "match" — id at expected path. "drift" — id at different path. */
|
||||
@@ -217,3 +217,60 @@ export function sourcePageCount(id: string, env?: NodeJS.ProcessEnv): number | n
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Whether a source's call graph has been built.
|
||||
*
|
||||
* "completed" — `gbrain dream` has run a full maintenance cycle, so the
|
||||
* brain-global `resolve_symbol_edges` phase populated this
|
||||
* source's call graph (`gbrain code-callers`/`code-callees`
|
||||
* return edges).
|
||||
* "never" — a cycle has provably NOT completed for this source.
|
||||
* "unknown" — doctor is unavailable, unparseable, or reports a failure
|
||||
* that doesn't name this source. Callers MUST treat unknown
|
||||
* conservatively (the orchestrator skips auto-dream and WARNs
|
||||
* rather than launch a ~35-min cycle on a flaky-doctor signal —
|
||||
* see the `gbrain-doctor-overstrict` learning).
|
||||
*/
|
||||
export type CycleStatus = "completed" | "never" | "unknown";
|
||||
|
||||
interface DoctorCheck {
|
||||
name?: string;
|
||||
status?: string;
|
||||
message?: string;
|
||||
}
|
||||
interface DoctorReport {
|
||||
checks?: DoctorCheck[];
|
||||
}
|
||||
|
||||
/**
|
||||
* Read `gbrain doctor --json --fast` and decide whether <sourceId>'s call
|
||||
* graph is built, by inspecting the `cycle_freshness` check.
|
||||
*
|
||||
* Decision table (cycle_freshness.status / message):
|
||||
* - ok → "completed"
|
||||
* - fail|warn AND message names <sourceId> → "never"
|
||||
* - fail|warn AND message omits <sourceId> → "unknown" (a real failure
|
||||
* about OTHER sources must not be silently read as completed for us)
|
||||
* - check absent / doctor null / other status → "unknown"
|
||||
*
|
||||
* `sourceId` is matched as a LITERAL substring (not a regex) so an id with
|
||||
* regex metacharacters can never misfire. Routes through `execGbrainJson` so
|
||||
* DATABASE_URL is seeded from gbrain's config (consistent with every other
|
||||
* gstack-side gbrain call). `env` is the caller's base env (tests inject a
|
||||
* shim on PATH).
|
||||
*/
|
||||
export function cycleCompleted(sourceId: string, env?: NodeJS.ProcessEnv): CycleStatus {
|
||||
const report = execGbrainJson<DoctorReport>(["doctor", "--json", "--fast"], { baseEnv: env });
|
||||
if (!report || !Array.isArray(report.checks)) return "unknown";
|
||||
|
||||
const check = report.checks.find((c) => c.name === "cycle_freshness");
|
||||
if (!check) return "unknown";
|
||||
|
||||
if (check.status === "ok") return "completed";
|
||||
if (check.status === "fail" || check.status === "warn") {
|
||||
const msg = check.message || "";
|
||||
return msg.includes(sourceId) ? "never" : "unknown";
|
||||
}
|
||||
return "unknown";
|
||||
}
|
||||
|
||||
@@ -0,0 +1,93 @@
|
||||
/**
|
||||
* gstack-decision-semantic — OPTIONAL gbrain enhancement for decision resurfacing.
|
||||
*
|
||||
* This is the ONLY decision module that touches gbrain. The reliable core
|
||||
* (lib/gstack-decision.ts) has zero gbrain imports and works with gbrain OFF; this
|
||||
* module is loaded lazily by `gstack-decision-search` only on `--semantic`, and every
|
||||
* path degrades to `null` (caller shows the reliable file results) when gbrain is
|
||||
* absent, unconfigured, times out, or returns nothing. It NEVER throws and NEVER
|
||||
* hangs (10s spawn timeout). We do not wire core function to this — gbrain is an
|
||||
* enhancement, never a dependency (the code-search lesson).
|
||||
*
|
||||
* Surface reality (verified against gbrain 0.42.x, not guessed):
|
||||
* - `gbrain search "<q>"` prints TEXT lines `[score] slug -- snippet`, NOT JSON
|
||||
* (so we parse the text surface; execGbrainJson would always null here).
|
||||
* - The curated-memory source is the one whose local_path is the gstack brain
|
||||
* worktree (`~/.gstack-brain-worktree`), id `default` by convention — NOT a
|
||||
* `gstack-brain-<user>` id. Scoping search to it keeps code/doc corpora out.
|
||||
*/
|
||||
|
||||
import { spawnGbrain } from "./gbrain-exec";
|
||||
import { parseSourcesList } from "./gbrain-sources";
|
||||
|
||||
const TIMEOUT_MS = 10_000;
|
||||
const BRAIN_WORKTREE_SUFFIX = ".gstack-brain-worktree";
|
||||
|
||||
export interface SemanticHit {
|
||||
score: number;
|
||||
slug: string;
|
||||
snippet: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the curated-memory source id (the gstack brain worktree). Returns null
|
||||
* when gbrain is down/unparseable OR no worktree-backed source is registered — the
|
||||
* caller then searches unscoped (best-effort) rather than failing.
|
||||
*/
|
||||
export function resolveMemorySourceId(env?: NodeJS.ProcessEnv): string | null {
|
||||
const r = spawnGbrain(["sources", "list", "--json"], { baseEnv: env, timeout: TIMEOUT_MS });
|
||||
if (r.status !== 0) return null;
|
||||
let rows;
|
||||
try {
|
||||
rows = parseSourcesList(JSON.parse(r.stdout || "null"));
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
const atWorktree = rows.filter(
|
||||
(s) => typeof s.local_path === "string" && s.local_path.endsWith(BRAIN_WORKTREE_SUFFIX),
|
||||
);
|
||||
const pick = atWorktree.find((s) => s.id === "default") ?? atWorktree[0];
|
||||
return pick?.id ?? null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse gbrain search's text output into scored hits. Lines look like:
|
||||
* `[0.4361] slug -- snippet text...`
|
||||
* Non-matching lines (banners, blanks) are skipped. Exported for deterministic
|
||||
* unit testing of the parser without a live gbrain.
|
||||
*/
|
||||
export function parseSearchHits(stdout: string, minScore: number, limit: number): SemanticHit[] {
|
||||
const hits: SemanticHit[] = [];
|
||||
for (const line of stdout.split("\n")) {
|
||||
const m = line.match(/^\[([\d.]+)\]\s+(\S+)\s+--\s+(.*)$/);
|
||||
if (!m) continue;
|
||||
const score = parseFloat(m[1]);
|
||||
if (!Number.isFinite(score) || score < minScore) continue;
|
||||
hits.push({ score, slug: m[2], snippet: m[3].trim() });
|
||||
}
|
||||
return hits.slice(0, limit);
|
||||
}
|
||||
|
||||
/**
|
||||
* Semantic recall over the curated-memory source. Returns parsed hits, or `null`
|
||||
* when gbrain is unavailable / errors (caller MUST degrade to the reliable file
|
||||
* results on null). An empty array means gbrain ran but found nothing relevant
|
||||
* (e.g. memory not synced yet) — also honest, distinct from null. Never throws,
|
||||
* never hangs.
|
||||
*/
|
||||
export function semanticRecall(
|
||||
query: string,
|
||||
env?: NodeJS.ProcessEnv,
|
||||
minScore = 0.1,
|
||||
limit = 3,
|
||||
): SemanticHit[] | null {
|
||||
if (!query.trim()) return null;
|
||||
// Require the curated-memory source. If it's absent (gbrain down OR no worktree-backed
|
||||
// source), degrade to null rather than searching UNSCOPED — an unscoped search pulls
|
||||
// code/doc corpora that would be mislabeled as "related decisions" (Codex finding).
|
||||
const sourceId = resolveMemorySourceId(env);
|
||||
if (!sourceId) return null;
|
||||
const r = spawnGbrain(["search", query, "--source", sourceId], { baseEnv: env, timeout: TIMEOUT_MS });
|
||||
if (r.status !== 0) return null; // gbrain down / not on PATH / errored → degrade
|
||||
return parseSearchHits(r.stdout || "", minScore, limit);
|
||||
}
|
||||
@@ -0,0 +1,325 @@
|
||||
/**
|
||||
* gstack-decision — event-sourced institutional decision memory.
|
||||
*
|
||||
* decisions.jsonl is an APPEND-ONLY EVENT LOG (not mutable rows): `decide`,
|
||||
* `supersede`, and `redact` events. "Active" is COMPUTED — a `decide` whose id is
|
||||
* not later referenced by a `supersede`/`redact`. This is the eng-review event-
|
||||
* sourcing decision (a mutable `status` field would contradict append-only).
|
||||
*
|
||||
* Built on lib/jsonl-store.ts (shared injection-reject + atomic append + tolerant
|
||||
* read). Free-text fields are injection-checked AND redact-scanned on write
|
||||
* (HIGH-tier secret → reject), so a secret never silently persists and resurfaced
|
||||
* text can't carry instructions. gbrain is never required — this is the reliable
|
||||
* file-only core; semantic recall is a later, optional enhancement.
|
||||
*/
|
||||
|
||||
import { join } from "path";
|
||||
import { homedir } from "os";
|
||||
import { randomUUID } from "crypto";
|
||||
import { writeFileSync, renameSync, existsSync, readFileSync, appendFileSync, statSync, openSync, closeSync, unlinkSync } from "fs";
|
||||
import { appendJsonl, readJsonl, hasInjection } from "./jsonl-store";
|
||||
import { scan } from "./redact-engine";
|
||||
|
||||
export type DecisionKind = "decide" | "supersede" | "redact";
|
||||
export type DecisionScope = "repo" | "branch" | "issue";
|
||||
export type DecisionSource = "user" | "skill" | "agent";
|
||||
|
||||
export const DECISION_SCOPES: readonly DecisionScope[] = ["repo", "branch", "issue"];
|
||||
export const DECISION_SOURCES: readonly DecisionSource[] = ["user", "skill", "agent"];
|
||||
|
||||
export interface DecisionEvent {
|
||||
id: string;
|
||||
kind: DecisionKind;
|
||||
decision?: string;
|
||||
rationale?: string;
|
||||
alternatives_considered?: string;
|
||||
/** For supersede/redact: the id of the `decide` event being acted on. */
|
||||
supersedes?: string;
|
||||
scope: DecisionScope;
|
||||
branch?: string;
|
||||
issue?: string;
|
||||
date: string;
|
||||
session?: string;
|
||||
source: DecisionSource;
|
||||
confidence?: number;
|
||||
}
|
||||
|
||||
export interface ActiveDecision extends DecisionEvent {
|
||||
kind: "decide";
|
||||
}
|
||||
|
||||
export interface DecisionPaths {
|
||||
log: string;
|
||||
snapshot: string;
|
||||
archive: string;
|
||||
}
|
||||
|
||||
/** Resolve the per-project decision store paths. Bins pass slug + GSTACK_HOME. */
|
||||
export function decisionPaths(slug: string, gstackHome?: string): DecisionPaths {
|
||||
const home = gstackHome || process.env.GSTACK_HOME || join(homedir(), ".gstack");
|
||||
const dir = join(home, "projects", slug || "unknown");
|
||||
return {
|
||||
log: join(dir, "decisions.jsonl"),
|
||||
snapshot: join(dir, "decisions.active.json"),
|
||||
archive: join(dir, "decisions.archive.jsonl"),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Datamark resurfaced decision text so a stored string can't masquerade as
|
||||
* instructions or break out of the Context Recovery fence when it lands in agent
|
||||
* context (codex hardening #3: resurface = DATA, not instructions). Write-time
|
||||
* `hasInjection` is a denylist; this is the render-boundary defense-in-depth that
|
||||
* also covers `--all`/snapshot reads and records written before a pattern existed.
|
||||
* Neutralizes: control chars, newlines (defensive — events are single-line),
|
||||
* code fences, `---` banner sentinels, and `<|role|>` / `</system>` markers.
|
||||
*/
|
||||
export function datamark(text: string): string {
|
||||
const ZWSP = "\u200b"; // zero-width space: breaks token recognition, near-invisible
|
||||
return text
|
||||
// strip C0/C1 control chars + Unicode line terminators (U+0085/2028/2029 render as
|
||||
// newlines in many tokenizers/markdown; "strip newlines" must cover them)
|
||||
.replace(/[\u0000-\u001f\u007f\u0085\u2028\u2029]/g, " ")
|
||||
.replace(/`{3,}/g, "'''") // neutralize markdown code fences
|
||||
.replace(/-{3,}/g, "\u2014") // neutralize `---` banner sentinels (em dash)
|
||||
.replace(/<\|/g, `<${ZWSP}|`) // neutralize <|im_start|>-style chat markers
|
||||
.replace(/\|>/g, `|${ZWSP}>`)
|
||||
.replace(/<(\/?)(system|user|assistant|tool)>/gi, `<${ZWSP}$1$2>`) // neutralize role tags
|
||||
// neutralize chat turn-prefixes (Human:/Assistant:/System:/User:) — defeat the
|
||||
// angle-tag pass and are Claude's native turn delimiters
|
||||
.replace(/\b(human|assistant|system|user)(\s*):/gi, `$1${ZWSP}$2:`);
|
||||
}
|
||||
|
||||
export type ValidateResult =
|
||||
| { ok: true; event: DecisionEvent }
|
||||
| { ok: false; error: string };
|
||||
|
||||
/**
|
||||
* Validate + stamp a `decide` event. Rejects (no silent persist) on:
|
||||
* - missing/empty decision text or invalid scope/source,
|
||||
* - injection-like content in any free-text field (datamark-on-write),
|
||||
* - a HIGH-tier secret (redact engine) in any free-text field.
|
||||
*/
|
||||
export function validateDecide(input: Partial<DecisionEvent>): ValidateResult {
|
||||
if (!input.decision || typeof input.decision !== "string" || !input.decision.trim()) {
|
||||
return { ok: false, error: "decision text is required" };
|
||||
}
|
||||
const scope = input.scope ?? "repo";
|
||||
if (!DECISION_SCOPES.includes(scope)) {
|
||||
return { ok: false, error: `invalid scope "${scope}"; must be ${DECISION_SCOPES.join("|")}` };
|
||||
}
|
||||
const source = input.source ?? "agent";
|
||||
if (!DECISION_SOURCES.includes(source)) {
|
||||
return { ok: false, error: `invalid source "${source}"; must be ${DECISION_SOURCES.join("|")}` };
|
||||
}
|
||||
if (input.confidence !== undefined) {
|
||||
const c = Number(input.confidence);
|
||||
if (!Number.isInteger(c) || c < 1 || c > 10) {
|
||||
return { ok: false, error: "confidence must be integer 1-10" };
|
||||
}
|
||||
}
|
||||
|
||||
// Scan ALL stored free-text — incl. branch/issue, which are surfaced (and emitted raw
|
||||
// via --json), so they must not carry secrets or injection either (Codex finding).
|
||||
const freeText = [input.decision, input.rationale, input.alternatives_considered, input.branch, input.issue]
|
||||
.filter((s): s is string => typeof s === "string")
|
||||
.join("\n");
|
||||
|
||||
if (hasInjection(freeText)) {
|
||||
return { ok: false, error: "decision contains instruction-like content (injection), rejected" };
|
||||
}
|
||||
const redacted = scan(freeText);
|
||||
if (redacted.counts.HIGH > 0) {
|
||||
return {
|
||||
ok: false,
|
||||
error: `decision contains a HIGH-tier secret (${redacted.counts.HIGH} finding(s)); rotate + remove it, do not log secrets`,
|
||||
};
|
||||
}
|
||||
// MEDIUM = PII / credential-shaped content. The taxonomy says "confirm via
|
||||
// AskUserQuestion", but this store is NON-INTERACTIVE and syncs cross-machine,
|
||||
// so there is no confirm path — fail closed rather than silently persist + sync a
|
||||
// secret that later resurfaces into agent context.
|
||||
if (redacted.counts.MEDIUM > 0) {
|
||||
return {
|
||||
ok: false,
|
||||
error: `decision contains MEDIUM-tier sensitive content (${redacted.counts.MEDIUM} finding(s): PII or credential-shaped). This store is non-interactive and syncs across machines, so it fails closed — remove or rephrase the value before logging.`,
|
||||
};
|
||||
}
|
||||
|
||||
const event: DecisionEvent = {
|
||||
id: input.id || randomUUID(),
|
||||
kind: "decide",
|
||||
decision: input.decision.trim(),
|
||||
rationale: input.rationale,
|
||||
alternatives_considered: input.alternatives_considered,
|
||||
scope,
|
||||
branch: input.branch || undefined,
|
||||
issue: input.issue || undefined,
|
||||
date: input.date || new Date().toISOString(),
|
||||
session: input.session,
|
||||
source,
|
||||
confidence: input.confidence === undefined ? undefined : Number(input.confidence),
|
||||
};
|
||||
return { ok: true, event };
|
||||
}
|
||||
|
||||
/** Build a supersede/redact event referencing an existing decide-event id. */
|
||||
export function makeRefEvent(kind: "supersede" | "redact", targetId: string, opts: { session?: string; source?: DecisionSource } = {}): DecisionEvent {
|
||||
return {
|
||||
id: randomUUID(),
|
||||
kind,
|
||||
supersedes: targetId,
|
||||
scope: "repo",
|
||||
date: new Date().toISOString(),
|
||||
session: opts.session,
|
||||
source: opts.source ?? "agent",
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute the ACTIVE decisions: `decide` events whose id is NOT referenced by any
|
||||
* later `supersede`/`redact`. Dangling refs (supersede/redact pointing at an id
|
||||
* that has no `decide`) are tolerated — ignored, never thrown. Returned in date
|
||||
* order (oldest first).
|
||||
*/
|
||||
export function computeActive(events: DecisionEvent[]): ActiveDecision[] {
|
||||
const retired = new Set<string>();
|
||||
for (const e of events) {
|
||||
if ((e.kind === "supersede" || e.kind === "redact") && e.supersedes) {
|
||||
retired.add(e.supersedes); // dangling target id is harmless — just a no-op
|
||||
}
|
||||
}
|
||||
return events
|
||||
.filter((e): e is ActiveDecision => e.kind === "decide" && !retired.has(e.id))
|
||||
.sort((a, b) => (a.date < b.date ? -1 : a.date > b.date ? 1 : 0));
|
||||
}
|
||||
|
||||
/**
|
||||
* Scope filter for resurfacing: repo-scoped decisions always apply; branch-scoped
|
||||
* only when the branch matches the current context; issue-scoped only when the
|
||||
* issue matches. (Recency != relevance — callers filter by scope, not just date.)
|
||||
*/
|
||||
export function filterByScope(active: ActiveDecision[], ctx: { branch?: string; issue?: string }): ActiveDecision[] {
|
||||
return active.filter((d) => {
|
||||
if (d.scope === "repo") return true;
|
||||
if (d.scope === "branch") return !!ctx.branch && d.branch === ctx.branch;
|
||||
if (d.scope === "issue") return !!ctx.issue && d.issue === ctx.issue;
|
||||
return false; // unknown/garbage scope: fail conservative, don't leak into every context
|
||||
});
|
||||
}
|
||||
|
||||
/** Append a validated event atomically (single-line, concurrency-safe). */
|
||||
export function appendEvent(paths: DecisionPaths, event: DecisionEvent): void {
|
||||
appendJsonl(paths.log, event);
|
||||
}
|
||||
|
||||
/** Read all events tolerantly (skips malformed/partial-tail lines). */
|
||||
export function readEvents(paths: DecisionPaths): DecisionEvent[] {
|
||||
return readJsonl<DecisionEvent>(paths.log);
|
||||
}
|
||||
|
||||
/**
|
||||
* Write the bounded active snapshot (`decisions.active.json`) atomically. Context
|
||||
* Recovery and search read THIS, not the full history — session start stays
|
||||
* O(active), not O(history).
|
||||
*/
|
||||
export function writeSnapshot(paths: DecisionPaths, active: ActiveDecision[]): void {
|
||||
const tmp = `${paths.snapshot}.tmp.${process.pid}`;
|
||||
writeFileSync(tmp, JSON.stringify(active), "utf-8");
|
||||
renameSync(tmp, paths.snapshot);
|
||||
}
|
||||
|
||||
/** Read the bounded active snapshot. Returns [] if missing/corrupt (caller may rebuild). */
|
||||
export function readSnapshot(paths: DecisionPaths): ActiveDecision[] {
|
||||
if (!existsSync(paths.snapshot)) return [];
|
||||
try {
|
||||
const v = JSON.parse(readFileSync(paths.snapshot, "utf-8"));
|
||||
return Array.isArray(v) ? (v as ActiveDecision[]) : [];
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
/** Recompute active from the event log and refresh the snapshot. Returns active. */
|
||||
export function rebuildSnapshot(paths: DecisionPaths): ActiveDecision[] {
|
||||
const active = computeActive(readEvents(paths));
|
||||
writeSnapshot(paths, active);
|
||||
return active;
|
||||
}
|
||||
|
||||
export interface CompactResult {
|
||||
activeCount: number;
|
||||
/** superseded decisions moved to the archive (history kept). */
|
||||
archivedCount: number;
|
||||
/** redacted decisions DROPPED entirely (expunged, NOT archived). */
|
||||
expungedCount: number;
|
||||
/** true when compaction was skipped to avoid clobbering a concurrent writer/compactor. */
|
||||
skipped?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compact the event log to the active set.
|
||||
* - active decisions → kept in `decisions.jsonl`,
|
||||
* - superseded decisions → appended to `decisions.archive.jsonl` (history),
|
||||
* - REDACTED decisions → expunged (dropped, NOT archived) — that's redact's job:
|
||||
* a `redact` is how an accidentally-captured secret leaves the store for good.
|
||||
*
|
||||
* Concurrency: appends are lock-free (O_APPEND), but compact is a read-modify-rewrite
|
||||
* that would clobber an append landing in its window. Two guards: (1) an O_EXCL lock
|
||||
* file serializes compactions (no double-archive / tmp tear); (2) the log size is
|
||||
* re-checked immediately before the destructive write — if an append landed since the
|
||||
* read, compact ABORTS untouched (returns skipped) so no decision is ever lost. The
|
||||
* caller re-runs. Atomic rewrite (tmp + rename); refreshes the snapshot.
|
||||
*/
|
||||
export function compact(paths: DecisionPaths): CompactResult {
|
||||
const lockPath = `${paths.log}.compact.lock`;
|
||||
let lockFd: number;
|
||||
try {
|
||||
lockFd = openSync(lockPath, "wx"); // O_EXCL|O_CREAT — throws EEXIST if a compact holds it
|
||||
} catch (err) {
|
||||
if ((err as NodeJS.ErrnoException).code === "EEXIST") {
|
||||
return { activeCount: computeActive(readEvents(paths)).length, archivedCount: 0, expungedCount: 0, skipped: true };
|
||||
}
|
||||
throw err;
|
||||
}
|
||||
try {
|
||||
const sizeBefore = existsSync(paths.log) ? statSync(paths.log).size : 0;
|
||||
const events = readEvents(paths);
|
||||
const active = computeActive(events);
|
||||
const activeIds = new Set(active.map((d) => d.id));
|
||||
const redactedIds = new Set(
|
||||
events.filter((e) => e.kind === "redact" && e.supersedes).map((e) => e.supersedes as string),
|
||||
);
|
||||
// Superseded = a decide that's neither active nor redacted. Archive these for history.
|
||||
const superseded = events.filter(
|
||||
(e): e is DecisionEvent => e.kind === "decide" && !activeIds.has(e.id) && !redactedIds.has(e.id),
|
||||
);
|
||||
|
||||
// Append-race guard: if the log grew/changed since we read it, an append landed —
|
||||
// rewriting now would drop it. Abort untouched; the caller re-runs.
|
||||
const sizeNow = existsSync(paths.log) ? statSync(paths.log).size : 0;
|
||||
if (sizeNow !== sizeBefore) {
|
||||
return { activeCount: active.length, archivedCount: 0, expungedCount: 0, skipped: true };
|
||||
}
|
||||
|
||||
// One batched append (not one open/write/close per event) — matches the atomic
|
||||
// batched rewrite of the active log below and shrinks the mid-compact crash window.
|
||||
if (superseded.length) {
|
||||
appendFileSync(paths.archive, superseded.map((e) => JSON.stringify(e)).join("\n") + "\n", "utf-8");
|
||||
}
|
||||
|
||||
const tmp = `${paths.log}.tmp.${process.pid}`;
|
||||
writeFileSync(tmp, active.map((d) => JSON.stringify(d)).join("\n") + (active.length ? "\n" : ""), "utf-8");
|
||||
renameSync(tmp, paths.log);
|
||||
writeSnapshot(paths, active);
|
||||
|
||||
return { activeCount: active.length, archivedCount: superseded.length, expungedCount: redactedIds.size };
|
||||
} finally {
|
||||
closeSync(lockFd);
|
||||
try {
|
||||
unlinkSync(lockPath);
|
||||
} catch {
|
||||
// best-effort lock cleanup; a leftover lock only blocks the NEXT compact, which re-runs
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,96 @@
|
||||
/**
|
||||
* jsonl-store — shared, audited plumbing for gstack's append-only JSONL stores.
|
||||
*
|
||||
* Single source of truth for the three things every JSONL store must get right:
|
||||
* 1. Injection sanitization (the prompt-injection patterns that must NOT survive
|
||||
* into agent context when a record is later resurfaced).
|
||||
* 2. Atomic single-line append (concurrent agents must not corrupt the file).
|
||||
* 3. Tolerant read (a partially-written tail or one corrupt line must not take
|
||||
* down the whole read).
|
||||
*
|
||||
* Extracted from `bin/gstack-learnings-log` (D2A) so `gstack-learnings-*` and the
|
||||
* new `gstack-decision-*` bins share ONE audited path — a new injection pattern or
|
||||
* a write-atomicity fix lands in both at once, never drifts. Per the
|
||||
* `squash-with-regen` / DRY discipline + the eng-review D2A decision.
|
||||
*/
|
||||
|
||||
import { appendFileSync, readFileSync, existsSync } from "fs";
|
||||
|
||||
/**
|
||||
* Prompt-injection patterns. If any matches a free-text field (insight, rationale,
|
||||
* decision), the record is REJECTED at write time — these strings could otherwise
|
||||
* be replayed into a future agent's context as instructions when the record is
|
||||
* resurfaced. Keep this list the ONLY copy (callers import it; do not re-declare).
|
||||
*/
|
||||
export const INJECTION_PATTERNS: readonly RegExp[] = [
|
||||
/ignore\s+(all\s+)?previous\s+(instructions|context|rules)/i,
|
||||
/you\s+are\s+now\s+/i,
|
||||
/always\s+output\s+no\s+findings/i,
|
||||
/skip\s+(all\s+)?(security|review|checks)/i,
|
||||
/override[:\s]/i,
|
||||
/\bsystem\s*:/i,
|
||||
/\bassistant\s*:/i,
|
||||
/\buser\s*:/i,
|
||||
/\bhuman\s*:/i, // Claude's native turn prefix — bypassed the denylist AND datamark
|
||||
/disregard\s+(all\s+)?(previous|above|prior)/i,
|
||||
/from\s+now\s+on\b/i,
|
||||
/do\s+not\s+(report|flag|mention)/i,
|
||||
/approve\s+(all|every|this)/i,
|
||||
];
|
||||
|
||||
/** True if `text` contains an instruction-like injection pattern. */
|
||||
export function hasInjection(text: string): boolean {
|
||||
return INJECTION_PATTERNS.some((p) => p.test(text));
|
||||
}
|
||||
|
||||
/** Returns the first injection pattern that matches, or null. For actionable errors. */
|
||||
export function firstInjectionMatch(text: string): RegExp | null {
|
||||
return INJECTION_PATTERNS.find((p) => p.test(text)) ?? null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Atomic single-line append of `obj` as one JSON line.
|
||||
*
|
||||
* Concurrency: opens with `a` (O_APPEND); a single write under PIPE_BUF (>=512,
|
||||
* 4096+ on macOS/Linux) is atomic across processes, so concurrent agents appending
|
||||
* never interleave. Records MUST serialize to a single line (no embedded newline) —
|
||||
* we throw rather than risk a multi-line record breaking the one-record-per-line
|
||||
* invariant the tolerant reader relies on.
|
||||
*
|
||||
* Caveat: a record larger than PIPE_BUF loses the cross-process atomicity guarantee.
|
||||
* Keep records line-bounded; very large free-text should be truncated by the caller.
|
||||
*/
|
||||
export function appendJsonl(path: string, obj: unknown): void {
|
||||
const line = JSON.stringify(obj);
|
||||
if (line.includes("\n")) {
|
||||
throw new Error("jsonl-store: record serialized to multiple lines (embedded newline)");
|
||||
}
|
||||
appendFileSync(path, line + "\n", { encoding: "utf-8" });
|
||||
}
|
||||
|
||||
/**
|
||||
* Tolerant reader: parse each line, SKIP malformed ones (partial-write tail, a
|
||||
* corrupt line, a non-JSON line) rather than throwing. A broken line never takes
|
||||
* down the whole read. Missing file → empty array. Unknown fields are preserved
|
||||
* (forward-compatible: a schema bump on the writer doesn't break older readers).
|
||||
*/
|
||||
export function readJsonl<T = unknown>(path: string): T[] {
|
||||
if (!existsSync(path)) return [];
|
||||
let raw: string;
|
||||
try {
|
||||
raw = readFileSync(path, "utf-8");
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
const out: T[] = [];
|
||||
for (const line of raw.split("\n")) {
|
||||
const trimmed = line.trim();
|
||||
if (!trimmed) continue;
|
||||
try {
|
||||
out.push(JSON.parse(trimmed) as T);
|
||||
} catch {
|
||||
// Malformed line (partial tail / corruption) — skip, keep reading.
|
||||
}
|
||||
}
|
||||
return out;
|
||||
}
|
||||
Reference in New Issue
Block a user