mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
security: harden migration + context-save after adversarial review
Adversarial review (Claude + Codex, both high confidence) identified 6
critical production-harm findings in the /ship pre-landing pass.
All folded in.
Migration v1.0.1.0.sh hardening:
- Add explicit `[ -z "${HOME:-}" ]` guard. HOME="" survives set -u and
expands paths to /.claude/skills/... which could hit absolute paths
under root/containers/sudo-without-H.
- Add python3 fallback inside resolve_real() (was missing; broken
symlinks silently defeated ownership check).
- Ownership-guard Shape 2 (~/.claude/skills/gstack/checkpoint/). Was
unconditional rm -rf. Now: if symlink, check target resolves inside
gstack; if regular dir, check realpath resolves inside gstack. A
user's hand-edited customization or a symlink pointing outside gstack
is preserved with a notice.
- Use `rm --` and `rm -r --` consistently to resist hostile basenames.
- Use `find -type f -not -name .DS_Store -not -name ._*` instead of
`ls -A | grep`. macOS sidecars no longer mask a legit prefix-mode
install. Strip sidecars explicitly before removing the dir.
context-save/SKILL.md.tmpl:
- Sanitize title in bash, not LLM prose. Allowlist [a-z0-9.-], cap 60
chars, default to "untitled". Closes a prompt-injection surface where
`/context-save $(rm -rf ~)` could propagate into subsequent commands.
- Collision-safe filename. If ${TIMESTAMP}-${SLUG}.md already exists
(same-second double-save with same title), append a 4-char random
suffix. The skill contract says "saved files are append-only" — this
enforces it. Silent overwrite was a data-loss bug.
context-restore/SKILL.md.tmpl:
- Cap `find ... | sort -r` at 20 entries via `| head -20`. A user with
10k+ saved files no longer blows the context window just to pick one.
/context-save list still handles the full-history listing path.
test/skill-e2e-autoplan-dual-voice.test.ts:
- Filter transcript to tool_use / tool_result / assistant entries
before matching, so prompt-text mentions of "plan-ceo-review" don't
force the reachedPhase1 assertion to pass. Phase-1 assertion now
requires completion markers ("Phase 1 complete", "Phase 2 started"),
not mere name occurrence.
- claudeVoiceFired now requires JSON evidence of an Agent tool_use
(name:"Agent" or subagent_type field), not the literal string
"Agent(" which could appear anywhere.
- codexVoiceFired now requires a Bash tool_use with a `codex exec/review`
command string, not prompt-text mentions.
All SKILL.md files regenerated. Golden fixtures updated. bun test: 0
failures across 80+ targeted tests and the full suite.
Review source: /ship Step 11 adversarial pass (claude subagent + codex
exec). Same findings independently surfaced by both reviewers — this is
cross-model high confidence.
This commit is contained in:
@@ -85,20 +85,37 @@ Add a new /greet skill that prints a welcome message.
|
||||
// Accept EITHER outcome as success:
|
||||
// (a) Both voices produced output (ideal case)
|
||||
// (b) Codex unavailable + Claude voice produced output (graceful degrade)
|
||||
// Session runner returns `output` (final assistant message). Search
|
||||
// transcript tool-call inputs/outputs as a broader net for voice fingerprints.
|
||||
const transcriptText = JSON.stringify(result.transcript || []);
|
||||
const out = (result.output ?? '') + '\n' + transcriptText;
|
||||
const claudeVoiceFired = /Claude\s+(CEO|subagent)|claude-subagent|Agent\s*\(|subagent_type/i.test(out);
|
||||
const codexVoiceFired = /codex\s+(exec|review)|CODEX SAYS|\[via:codex\]|codex-plan-review/i.test(out);
|
||||
const codexUnavailable = /\[codex-unavailable\]|AUTH_FAILED|CODEX_NOT_AVAILABLE|codex_cli_missing|Codex CLI not found/i.test(out);
|
||||
// Search ONLY the tool-call structure — NOT the prompt string that went in.
|
||||
// Matching against full transcript is risky because the prompt itself
|
||||
// contains "plan-ceo-review" and other marker strings that would produce
|
||||
// false positives regardless of skill behavior. Filter to tool_result
|
||||
// content + assistant messages emitted DURING execution.
|
||||
const transcript = Array.isArray(result.transcript) ? result.transcript : [];
|
||||
const executionContent = transcript
|
||||
.filter((entry: any) => entry && (entry.type === 'tool_use' || entry.type === 'tool_result' || entry.role === 'assistant'))
|
||||
.map((entry: any) => JSON.stringify(entry))
|
||||
.join('\n');
|
||||
const out = (result.output ?? '') + '\n' + executionContent;
|
||||
|
||||
// Claude voice: require evidence of a dispatched Agent subagent, not
|
||||
// merely the literal string "Agent(" (which could appear in any text).
|
||||
// Task/Agent tool_use entries have name:"Agent" or subagent_type:"..."
|
||||
const claudeVoiceFired = /"name":\s*"Agent"|"subagent_type":\s*"[^"]/.test(out) ||
|
||||
/Claude\s+(CEO|subagent)\s+(review|complete|finished)|claude-subagent\s/i.test(out);
|
||||
// Codex voice: require evidence of codex CLI invocation (command string in
|
||||
// a Bash tool_use), not prompt-text mentions.
|
||||
const codexVoiceFired = /"command":\s*"[^"]*codex\s+(exec|review)/.test(out) ||
|
||||
/CODEX SAYS\s*\(/i.test(out);
|
||||
// Unavailable markers: explicit probe-failure strings emitted by the skill.
|
||||
const codexUnavailable = /\[codex-unavailable\]|AUTH_FAILED\b|CODEX_NOT_AVAILABLE\b|codex_cli_missing|Codex CLI not found/i.test(out);
|
||||
|
||||
expect(claudeVoiceFired).toBe(true);
|
||||
expect(codexVoiceFired || codexUnavailable).toBe(true);
|
||||
|
||||
// Hang protection: if the skill reached Phase 1 at all, our hardening worked.
|
||||
// If it didn't, this is a regression from the pre-wave stdin-deadlock era.
|
||||
const reachedPhase1 = /Phase 1|CEO\s+Review|Strategy\s*&\s*Scope|plan-ceo-review/i.test(out);
|
||||
// Hang protection: require phase completion evidence, not name mentions.
|
||||
// "Phase 1 complete" or a phase-transition marker, not "plan-ceo-review"
|
||||
// as a bare string (which appears in the prompt itself).
|
||||
const reachedPhase1 = /Phase\s+1\s+(complete|done|finished)|CEO\s+Review\s+(complete|done|approved)|Strategy\s*&\s*Scope\s+(complete|done)|Phase\s+2\s+(started|begin)/i.test(out);
|
||||
expect(reachedPhase1).toBe(true);
|
||||
|
||||
logCost('autoplan-dual-voice', result);
|
||||
|
||||
Reference in New Issue
Block a user