security: harden migration + context-save after adversarial review

Adversarial review (Claude + Codex, both high confidence) identified 6
critical production-harm findings in the /ship pre-landing pass.
All folded in.

Migration v1.0.1.0.sh hardening:
- Add explicit `[ -z "${HOME:-}" ]` guard. HOME="" survives set -u and
  expands paths to /.claude/skills/... which could hit absolute paths
  under root/containers/sudo-without-H.
- Add python3 fallback inside resolve_real() (was missing; broken
  symlinks silently defeated ownership check).
- Ownership-guard Shape 2 (~/.claude/skills/gstack/checkpoint/). Was
  unconditional rm -rf. Now: if symlink, check target resolves inside
  gstack; if regular dir, check realpath resolves inside gstack. A
  user's hand-edited customization or a symlink pointing outside gstack
  is preserved with a notice.
- Use `rm --` and `rm -r --` consistently to resist hostile basenames.
- Use `find -type f -not -name .DS_Store -not -name ._*` instead of
  `ls -A | grep`. macOS sidecars no longer mask a legit prefix-mode
  install. Strip sidecars explicitly before removing the dir.

context-save/SKILL.md.tmpl:
- Sanitize title in bash, not LLM prose. Allowlist [a-z0-9.-], cap 60
  chars, default to "untitled". Closes a prompt-injection surface where
  `/context-save $(rm -rf ~)` could propagate into subsequent commands.
- Collision-safe filename. If ${TIMESTAMP}-${SLUG}.md already exists
  (same-second double-save with same title), append a 4-char random
  suffix. The skill contract says "saved files are append-only" — this
  enforces it. Silent overwrite was a data-loss bug.

context-restore/SKILL.md.tmpl:
- Cap `find ... | sort -r` at 20 entries via `| head -20`. A user with
  10k+ saved files no longer blows the context window just to pick one.
  /context-save list still handles the full-history listing path.

test/skill-e2e-autoplan-dual-voice.test.ts:
- Filter transcript to tool_use / tool_result / assistant entries
  before matching, so prompt-text mentions of "plan-ceo-review" don't
  force the reachedPhase1 assertion to pass. Phase-1 assertion now
  requires completion markers ("Phase 1 complete", "Phase 2 started"),
  not mere name occurrence.
- claudeVoiceFired now requires JSON evidence of an Agent tool_use
  (name:"Agent" or subagent_type field), not the literal string
  "Agent(" which could appear anywhere.
- codexVoiceFired now requires a Bash tool_use with a `codex exec/review`
  command string, not prompt-text mentions.

All SKILL.md files regenerated. Golden fixtures updated. bun test: 0
failures across 80+ targeted tests and the full suite.

Review source: /ship Step 11 adversarial pass (claude subagent + codex
exec). Same findings independently surfaced by both reviewers — this is
cross-model high confidence.
This commit is contained in:
Garry Tan
2026-04-18 23:28:12 +08:00
parent 6f67406e01
commit 3df8ea8695
6 changed files with 135 additions and 45 deletions
+3 -1
View File
@@ -764,7 +764,9 @@ else
# copies/rsync). Filesystem mtime drifts and is not authoritative.
# 2. On macOS, `find ... | xargs ls -1t` with zero results falls back to
# listing cwd. `sort -r` on empty input cleanly returns nothing.
FILES=$(find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | sort -r)
# Cap at 20 most recent: a user with 10k saved files shouldn't blow the
# context window just listing them. /context-save list handles pagination.
FILES=$(find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | sort -r | head -20)
if [ -z "$FILES" ]; then
echo "NO_CHECKPOINTS"
else
+3 -1
View File
@@ -71,7 +71,9 @@ else
# copies/rsync). Filesystem mtime drifts and is not authoritative.
# 2. On macOS, `find ... | xargs ls -1t` with zero results falls back to
# listing cwd. `sort -r` on empty input cleanly returns nothing.
FILES=$(find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | sort -r)
# Cap at 20 most recent: a user with 10k saved files shouldn't blow the
# context window just listing them. /context-save list handles pagination.
FILES=$(find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | sort -r | head -20)
if [ -z "$FILES" ]; then
echo "NO_CHECKPOINTS"
else
+21 -3
View File
@@ -805,21 +805,39 @@ saved file.
### Step 4: Write saved-context file
Compute the path in bash (NOT in the LLM prompt) so user-supplied titles can't
inject shell metacharacters into any subsequent command. The sanitizer is an
allowlist: only `a-z 0-9 - .` survive.
```bash
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG
CHECKPOINT_DIR="$HOME/.gstack/projects/$SLUG/checkpoints"
mkdir -p "$CHECKPOINT_DIR"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
# Bash-side title sanitize. Pass the raw title as $1 when running this block.
# Example: TITLE_RAW="wintermute progress" bash -c '...'
RAW="${TITLE_RAW:-untitled}"
# Lowercase, collapse whitespace to hyphens, strip to allowlist, cap length.
TITLE_SLUG=$(printf '%s' "$RAW" | tr '[:upper:]' '[:lower:]' | tr -s ' \t' '-' | tr -cd 'a-z0-9.-' | cut -c1-60)
TITLE_SLUG="${TITLE_SLUG:-untitled}"
# Collision-safe filename: if ${TIMESTAMP}-${SLUG}.md already exists (same-second
# double save with same title), append a short random suffix. Filenames are
# append-only — never overwrite.
FILE="${CHECKPOINT_DIR}/${TIMESTAMP}-${TITLE_SLUG}.md"
if [ -e "$FILE" ]; then
SUFFIX=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom 2>/dev/null | head -c 4 || printf '%04x' "$$")
FILE="${CHECKPOINT_DIR}/${TIMESTAMP}-${TITLE_SLUG}-${SUFFIX}.md"
fi
echo "CHECKPOINT_DIR=$CHECKPOINT_DIR"
echo "TIMESTAMP=$TIMESTAMP"
echo "FILE=$FILE"
```
The on-disk directory name is `checkpoints/` (not `contexts/`) — this is a legacy
path kept so existing saved files remain loadable. Users never see it.
Write the file to `{CHECKPOINT_DIR}/{TIMESTAMP}-{title-slug}.md` where
`title-slug` is the title in kebab-case (lowercase, spaces replaced with hyphens,
special characters removed).
Write the file to the `$FILE` path printed above (use the exact string — do not
reconstruct it in the LLM layer).
The file format:
+21 -3
View File
@@ -112,21 +112,39 @@ saved file.
### Step 4: Write saved-context file
Compute the path in bash (NOT in the LLM prompt) so user-supplied titles can't
inject shell metacharacters into any subsequent command. The sanitizer is an
allowlist: only `a-z 0-9 - .` survive.
```bash
{{SLUG_SETUP}}
CHECKPOINT_DIR="$HOME/.gstack/projects/$SLUG/checkpoints"
mkdir -p "$CHECKPOINT_DIR"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
# Bash-side title sanitize. Pass the raw title as $1 when running this block.
# Example: TITLE_RAW="wintermute progress" bash -c '...'
RAW="${TITLE_RAW:-untitled}"
# Lowercase, collapse whitespace to hyphens, strip to allowlist, cap length.
TITLE_SLUG=$(printf '%s' "$RAW" | tr '[:upper:]' '[:lower:]' | tr -s ' \t' '-' | tr -cd 'a-z0-9.-' | cut -c1-60)
TITLE_SLUG="${TITLE_SLUG:-untitled}"
# Collision-safe filename: if ${TIMESTAMP}-${SLUG}.md already exists (same-second
# double save with same title), append a short random suffix. Filenames are
# append-only — never overwrite.
FILE="${CHECKPOINT_DIR}/${TIMESTAMP}-${TITLE_SLUG}.md"
if [ -e "$FILE" ]; then
SUFFIX=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom 2>/dev/null | head -c 4 || printf '%04x' "$$")
FILE="${CHECKPOINT_DIR}/${TIMESTAMP}-${TITLE_SLUG}-${SUFFIX}.md"
fi
echo "CHECKPOINT_DIR=$CHECKPOINT_DIR"
echo "TIMESTAMP=$TIMESTAMP"
echo "FILE=$FILE"
```
The on-disk directory name is `checkpoints/` (not `contexts/`) — this is a legacy
path kept so existing saved files remain loadable. Users never see it.
Write the file to `{CHECKPOINT_DIR}/{TIMESTAMP}-{title-slug}.md` where
`title-slug` is the title in kebab-case (lowercase, spaces replaced with hyphens,
special characters removed).
Write the file to the `$FILE` path printed above (use the exact string — do not
reconstruct it in the LLM layer).
The file format:
+60 -27
View File
@@ -20,33 +20,42 @@
# Idempotent: missing paths are no-ops.
set -euo pipefail
# Guard: refuse to run if HOME is unset or empty. With `set -u`, unset HOME
# errors out, but HOME="" (possible under sudo-without-H, systemd units, some
# CI runners) survives and produces dangerous absolute paths like
# "/.claude/skills/...". Abort cleanly.
if [ -z "${HOME:-}" ]; then
echo " [v1.0.1.0] HOME is unset or empty — skipping migration." >&2
exit 0
fi
SKILLS_DIR="${HOME}/.claude/skills"
OLD_TOPLEVEL="${SKILLS_DIR}/checkpoint"
OLD_NAMESPACED="${SKILLS_DIR}/gstack/checkpoint"
GSTACK_ROOT_REAL=""
# Helper: canonical-path a target (symlink-safe). Prints the resolved path, or
# empty on failure (broken symlink, ENOENT, ELOOP). Both realpath AND the python3
# fallback are tried — a single tool failure shouldn't defeat the ownership
# check. Returns empty string if both fail.
resolve_real() {
local target="$1"
local out=""
if command -v realpath >/dev/null 2>&1; then
out=$(realpath "$target" 2>/dev/null || true)
fi
if [ -z "$out" ] && command -v python3 >/dev/null 2>&1; then
out=$(python3 -c 'import os,sys;print(os.path.realpath(sys.argv[1]))' "$target" 2>/dev/null || true)
fi
printf '%s' "$out"
}
# Resolve the canonical path of the gstack skills root. If gstack isn't
# installed here, there's nothing to migrate.
if [ -d "${SKILLS_DIR}/gstack" ]; then
# Portable realpath: macOS BSD `readlink` lacks -f. Fall back to python3.
if command -v realpath >/dev/null 2>&1; then
GSTACK_ROOT_REAL=$(realpath "${SKILLS_DIR}/gstack" 2>/dev/null || true)
fi
if [ -z "$GSTACK_ROOT_REAL" ]; then
GSTACK_ROOT_REAL=$(python3 -c 'import os,sys;print(os.path.realpath(sys.argv[1]))' "${SKILLS_DIR}/gstack" 2>/dev/null || true)
fi
GSTACK_ROOT_REAL=$(resolve_real "${SKILLS_DIR}/gstack")
fi
# Helper: canonical-path a target (symlink-safe). Prints the resolved path.
resolve_real() {
local target="$1"
if command -v realpath >/dev/null 2>&1; then
realpath "$target" 2>/dev/null || true
return
fi
python3 -c 'import os,sys;print(os.path.realpath(sys.argv[1]))' "$target" 2>/dev/null || true
}
# Helper: does $1 (canonical path) live inside $2 (canonical path)?
path_inside() {
local inner="$1"
@@ -65,20 +74,24 @@ if [ -L "$OLD_TOPLEVEL" ]; then
# Directory symlink (or file symlink). Canonicalize and check ownership.
target_real=$(resolve_real "$OLD_TOPLEVEL")
if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then
rm "$OLD_TOPLEVEL"
rm -- "$OLD_TOPLEVEL"
echo " [v1.0.1.0] Removed stale /checkpoint symlink (was shadowing Claude Code's /rewind alias)."
removed_any=1
else
echo " [v1.0.1.0] Leaving $OLD_TOPLEVEL alone — symlink target is outside gstack."
echo " [v1.0.1.0] Leaving $OLD_TOPLEVEL alone — symlink target is outside gstack (or unresolvable)."
fi
elif [ -d "$OLD_TOPLEVEL" ]; then
# Regular directory. Only remove if it contains exactly one file named
# SKILL.md that's a symlink into gstack (gstack's prefix-install shape).
entries=$(ls -A "$OLD_TOPLEVEL" 2>/dev/null)
if [ "$entries" = "SKILL.md" ] && [ -L "$OLD_TOPLEVEL/SKILL.md" ]; then
# Use find to count real files, ignoring .DS_Store (macOS sidecars).
file_count=$(find "$OLD_TOPLEVEL" -maxdepth 1 -type f -not -name '.DS_Store' -not -name '._*' 2>/dev/null | wc -l | tr -d ' ')
symlink_count=$(find "$OLD_TOPLEVEL" -maxdepth 1 -type l 2>/dev/null | wc -l | tr -d ' ')
if [ "$file_count" = "0" ] && [ "$symlink_count" = "1" ] && [ -L "$OLD_TOPLEVEL/SKILL.md" ]; then
target_real=$(resolve_real "$OLD_TOPLEVEL/SKILL.md")
if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then
rm -r "$OLD_TOPLEVEL"
# Strip macOS sidecars first (not user content), then remove the dir.
find "$OLD_TOPLEVEL" -maxdepth 1 \( -name '.DS_Store' -o -name '._*' \) -type f -delete 2>/dev/null || true
rm -r -- "$OLD_TOPLEVEL"
echo " [v1.0.1.0] Removed stale /checkpoint install directory (gstack prefix-mode)."
removed_any=1
else
@@ -90,11 +103,31 @@ elif [ -d "$OLD_TOPLEVEL" ]; then
fi
# Missing → no-op (idempotency).
# --- Shape 2: ~/.claude/skills/gstack/checkpoint/ (gstack owns this dir unconditionally)
if [ -d "$OLD_NAMESPACED" ] || [ -L "$OLD_NAMESPACED" ]; then
rm -rf "$OLD_NAMESPACED"
echo " [v1.0.1.0] Removed stale ~/.claude/skills/gstack/checkpoint/ (replaced by context-save + context-restore)."
removed_any=1
# --- Shape 2: ~/.claude/skills/gstack/checkpoint/
# Ownership guard applies here too: only remove if this path resolves inside the
# gstack skills root. If a user replaced the directory with a symlink pointing
# elsewhere (e.g., at their own fork), respect it.
if [ -L "$OLD_NAMESPACED" ]; then
target_real=$(resolve_real "$OLD_NAMESPACED")
if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then
rm -- "$OLD_NAMESPACED"
echo " [v1.0.1.0] Removed stale ~/.claude/skills/gstack/checkpoint symlink."
removed_any=1
else
echo " [v1.0.1.0] Leaving $OLD_NAMESPACED alone — symlink target is outside gstack."
fi
elif [ -d "$OLD_NAMESPACED" ]; then
# Regular directory. This is the gstack-prefix install location. Check that
# it resolves to a path inside the gstack root (it should, unless someone
# hand-edited the tree).
target_real=$(resolve_real "$OLD_NAMESPACED")
if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then
rm -rf -- "$OLD_NAMESPACED"
echo " [v1.0.1.0] Removed stale ~/.claude/skills/gstack/checkpoint/ (replaced by context-save + context-restore)."
removed_any=1
else
echo " [v1.0.1.0] Leaving $OLD_NAMESPACED alone — resolves outside gstack."
fi
fi
if [ "$removed_any" = "1" ]; then
+27 -10
View File
@@ -85,20 +85,37 @@ Add a new /greet skill that prints a welcome message.
// Accept EITHER outcome as success:
// (a) Both voices produced output (ideal case)
// (b) Codex unavailable + Claude voice produced output (graceful degrade)
// Session runner returns `output` (final assistant message). Search
// transcript tool-call inputs/outputs as a broader net for voice fingerprints.
const transcriptText = JSON.stringify(result.transcript || []);
const out = (result.output ?? '') + '\n' + transcriptText;
const claudeVoiceFired = /Claude\s+(CEO|subagent)|claude-subagent|Agent\s*\(|subagent_type/i.test(out);
const codexVoiceFired = /codex\s+(exec|review)|CODEX SAYS|\[via:codex\]|codex-plan-review/i.test(out);
const codexUnavailable = /\[codex-unavailable\]|AUTH_FAILED|CODEX_NOT_AVAILABLE|codex_cli_missing|Codex CLI not found/i.test(out);
// Search ONLY the tool-call structure — NOT the prompt string that went in.
// Matching against full transcript is risky because the prompt itself
// contains "plan-ceo-review" and other marker strings that would produce
// false positives regardless of skill behavior. Filter to tool_result
// content + assistant messages emitted DURING execution.
const transcript = Array.isArray(result.transcript) ? result.transcript : [];
const executionContent = transcript
.filter((entry: any) => entry && (entry.type === 'tool_use' || entry.type === 'tool_result' || entry.role === 'assistant'))
.map((entry: any) => JSON.stringify(entry))
.join('\n');
const out = (result.output ?? '') + '\n' + executionContent;
// Claude voice: require evidence of a dispatched Agent subagent, not
// merely the literal string "Agent(" (which could appear in any text).
// Task/Agent tool_use entries have name:"Agent" or subagent_type:"..."
const claudeVoiceFired = /"name":\s*"Agent"|"subagent_type":\s*"[^"]/.test(out) ||
/Claude\s+(CEO|subagent)\s+(review|complete|finished)|claude-subagent\s/i.test(out);
// Codex voice: require evidence of codex CLI invocation (command string in
// a Bash tool_use), not prompt-text mentions.
const codexVoiceFired = /"command":\s*"[^"]*codex\s+(exec|review)/.test(out) ||
/CODEX SAYS\s*\(/i.test(out);
// Unavailable markers: explicit probe-failure strings emitted by the skill.
const codexUnavailable = /\[codex-unavailable\]|AUTH_FAILED\b|CODEX_NOT_AVAILABLE\b|codex_cli_missing|Codex CLI not found/i.test(out);
expect(claudeVoiceFired).toBe(true);
expect(codexVoiceFired || codexUnavailable).toBe(true);
// Hang protection: if the skill reached Phase 1 at all, our hardening worked.
// If it didn't, this is a regression from the pre-wave stdin-deadlock era.
const reachedPhase1 = /Phase 1|CEO\s+Review|Strategy\s*&\s*Scope|plan-ceo-review/i.test(out);
// Hang protection: require phase completion evidence, not name mentions.
// "Phase 1 complete" or a phase-transition marker, not "plan-ceo-review"
// as a bare string (which appears in the prompt itself).
const reachedPhase1 = /Phase\s+1\s+(complete|done|finished)|CEO\s+Review\s+(complete|done|approved)|Strategy\s*&\s*Scope\s+(complete|done)|Phase\s+2\s+(started|begin)/i.test(out);
expect(reachedPhase1).toBe(true);
logCost('autoplan-dual-voice', result);