Files
gstack/lib/staging-guard.ts
T
Garry Tan 476b0ec597 v1.56.1.0 fix(sync): staging-dir ownership guard + resume-correctness fixes (#1802) (#1856)
* fix(sync): fail-closed staging-dir ownership guard — prevent rm -rf of repo (#1802)

Adopts community fix #1827 by @diazMelgarejo (cyre). New lib/staging-guard.ts
exports checkOwnedStagingDir(), the single fail-closed predicate for 'safe to
recurse-delete or resume into', wired at cleanupStagingDir() (the deletion
chokepoint), decideResume(), the ingest entry point, and makeStagingDir()
(mints the .gstack-staging marker).

Fixes #1802.

Co-Authored-By: cyre <diazMelgarejo@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sync): don't route the remote-http persistent transcript dir through cleanup (#1802)

The ingest finally ran cleanupStagingDir() unconditionally, but in remote-http
mode stagingDir is the PERSISTENT transcript dir (~/.gstack/transcripts/) that
gstack-brain-sync push must consume. The remote-http branch documents the intent
to skip cleanup, but a finally runs on its return. Gate the call on
!remoteHttpMode so the ownership guard only ever sees .staging-ingest-* dirs.
Pre-gate this dir was deleted outright (broken artifacts handoff); post-#1827 it
produced a false 'prevent data loss' warning every sync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sync): preserve staging dir on internal import timeout (#1802 C3)

The import-timeout branch printed 'checkpoint preserved' but the finally then
deleted the staging dir: the SIGTERM forwarder's preserve branch only runs when
the PARENT is signalled, and an internal runGbrainImport timeout kills just the
child and returns normally. So #1611 resume-after-timeout never actually worked.
Mirror the forwarder in the timeout branch: set preserveStaging only when gbrain
checkpointed against this dir (finally then skips cleanup); otherwise clean up
and tell the user it restages instead of falsely promising a resume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sync): resume must not mark failed files as ingested (#1802 C4)

On resume, stagedPathToSource was rebuilt as an empty Map, so readNewFailures()
could not map gbrain's per-file failures back to source paths. Every failure
fell through to state recording — failed files were silently marked ingested and
never retried. Reconstruct the map from the prepared pages via a shared
stagedRelPath() helper (single source of truth with writeStaged, so the keys
can never drift). Exports stagedRelPath + readNewFailures for a behavioral test
proving the reconstructed map recovers the failure the empty map dropped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* harden(sync): close staging-guard TOCTOU + fail hard on marker write (#1802 C5)

checkOwnedStagingDir() now returns the realpath-resolved canonicalPath on a
pass, and cleanupStagingDir() rmSync's that instead of the raw input — closing
the gap where the input is a symlink swapped between the ownership check and the
delete. makeStagingDir() tears down the partial dir and rethrows if the marker
write fails, so a marker-less dir (which the guard would refuse forever) can
never leak.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: v1.56.1.0 — staging-dir ownership guard + resume-correctness fixes (#1802)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci: grant the eval report job issues:write so PR comment upsert stops 401ing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: cyre <diazMelgarejo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 06:51:10 -07:00

117 lines
5.5 KiB
TypeScript

/**
* staging-guard — fail-closed ownership proof for gstack ingest staging dirs.
*
* Fixes #1802. The /sync-gbrain memory stage stages prepared pages to a
* throwaway dir under ~/.gstack and `rm -rf`s it when done. The resume path
* (#1611) reused gbrain's `import-checkpoint.json` `dir` field as that staging
* dir WITHOUT proving it was one. A poisoned checkpoint — `dir` = the repo
* root, written when an autopilot `gbrain import` was SIGTERM'd while CWD was
* the repo — was then adopted as the staging dir and recursively deleted,
* destroying the user's working tree.
*
* Root cause is a TRUST failure, not path math: code deleted a path it never
* proved it owned. This module is the single definition of "a path gstack is
* allowed to recurse-delete or resume into", shared by the resume gate
* (decideResume) and the deletion chokepoint (cleanupStagingDir).
*
* Ownership requires ALL of the following (fail-closed — any failure ⇒ refuse):
* 1. Resolvable — realpathSync succeeds (resolves symlinks and `..` to a
* real location before any structural reasoning).
* 2. Structural — canonical path is a DIRECT child of $GSTACK_HOME named
* `.staging-ingest-*` (makeStagingDir's contract).
* 3. Not a repo — no `.git` entry inside. A screaming last-line tripwire:
* even a logic error elsewhere can never recurse-delete a
* git working tree.
* 4. Minted by us — a `.gstack-staging` marker file (written by
* makeStagingDir) is present. Turns "looks like ours"
* into "was created by us this lineage".
*
* Design note (steelman, 2026-06-02): a 4-model review panel split 3-1 on the
* marker. The dissent argued the structural check alone is sufficient and the
* marker adds a missing-token failure mode. Adopted anyway because that failure
* mode is fail-SAFE: a missing marker only forces an unnecessary re-stage
* (seconds), never a wrong deletion. The asymmetry — the marker can cost work
* but never data — settles it. The structural check still runs first and cheap.
*
* The deeper, "inevitable" fix lives upstream in gbrain: checkpoint.dir should
* always be a gbrain-minted staging dir, never CWD. This guard is the
* mitigation at gstack's own rm -rf boundary; see the companion gbrain issue.
*/
import { realpathSync, existsSync, statSync, lstatSync } from "fs";
import { join, dirname, basename } from "path";
/** Basename prefix every makeStagingDir() directory carries. */
export const STAGING_PREFIX = ".staging-ingest-";
/** Marker file minted inside each staging dir at creation. */
export const STAGING_MARKER = ".gstack-staging";
export interface StagingVerdict {
ok: boolean;
/** Precise rejection reason, for actionable logging. Undefined when ok. */
reason?: string;
/**
* The realpath-resolved directory the verdict actually validated. Present only
* when ok. Callers that delete MUST `rmSync` this path, not the raw input —
* deleting the canonical path closes the TOCTOU gap where the input is a
* symlink swapped between this check and the delete (#1802 C5).
*/
canonicalPath?: string;
}
/**
* Prove (fail-closed) that `dir` is a gstack-owned ingest staging directory
* that is safe to recurse-delete or resume into. Returns a structured verdict
* so callers can log exactly why a path was rejected.
*
* @param dir Candidate path (e.g. gbrain checkpoint.dir, or the active staging dir).
* @param gstackHome Resolved $GSTACK_HOME (injected for testability).
*/
export function checkOwnedStagingDir(dir: string, gstackHome: string): StagingVerdict {
if (!dir || typeof dir !== "string") {
return { ok: false, reason: "empty or non-string path" };
}
let canon: string;
let home: string;
try {
canon = realpathSync(dir);
home = realpathSync(gstackHome);
} catch {
// Missing path or broken symlink ⇒ cannot prove ownership ⇒ refuse.
return { ok: false, reason: "unresolvable path (missing dir or broken symlink)" };
}
// The target itself must be a directory (not a file/socket/etc named like one).
try {
if (!statSync(canon).isDirectory()) {
return { ok: false, reason: "not a directory" };
}
} catch {
return { ok: false, reason: "unstattable target" };
}
if (dirname(canon) !== home) {
return { ok: false, reason: `not a direct child of GSTACK_HOME (${home})` };
}
if (!basename(canon).startsWith(STAGING_PREFIX)) {
return { ok: false, reason: `basename does not start with "${STAGING_PREFIX}"` };
}
if (existsSync(join(canon, ".git"))) {
// Tripwire: never recurse-delete anything that looks like a git work tree.
return { ok: false, reason: "path contains .git — refusing to touch a git working tree" };
}
// Marker must be a REGULAR FILE we minted — not a directory or symlink that
// merely shares the name (lstat, not stat, so a symlink can't impersonate it).
try {
if (!lstatSync(join(canon, STAGING_MARKER)).isFile()) {
return { ok: false, reason: `"${STAGING_MARKER}" exists but is not a regular file` };
}
} catch {
return { ok: false, reason: `missing "${STAGING_MARKER}" marker — not minted by makeStagingDir` };
}
return { ok: true, canonicalPath: canon };
}
/** Boolean convenience wrapper around {@link checkOwnedStagingDir}. */
export function isOwnedStagingDir(dir: string, gstackHome: string): boolean {
return checkOwnedStagingDir(dir, gstackHome).ok;
}