Files
gstack/bin/gstack-question-preference
T
Garry Tan a6fb31726c v1.48.0.0 feat: AskUserQuestion split rule + runtime AUTO_DECIDE carve-out (#1740)
* feat(preamble): add "Handling 5+ options — split, never drop" rule

Agents repeatedly hit Conductor's 4-option AskUserQuestion cap and
silently drop one option to fit, shrinking the user's decision space.
This rule names the bug and gives two compliant shapes: batch into
≤4-groups (for coherent alternatives) or split into N sequential
per-option calls (for independent scope items, default).

Inline preamble subsection is ~15 lines (rule + buckets + pointer).
Full reference with worked examples, Hold/dependency semantics, and
final-summary validation lives in docs/askuserquestion-split.md.
The agent loads the docs file on demand when N>4.

Per-option call shape: D<N>.k header, ELI10, Recommendation, kind-note
(no completeness score — decision actions, not coverage), Include /
Defer / Cut / Hold buckets. Hold stops the chain immediately; the
final D<N>.final call validates dependencies and confirms the
assembled scope.

question_ids: <skill>-split-<option-slug> (kebab-case ASCII, ≤64
chars). Also fixes orphan "12. " prefix on the existing CJK rule.

Tier-2+ skills inherit via the existing resolver. SKILL.md regenerated
for all 41 affected skills + 3 golden fixtures. Net diff per SKILL.md:
~34 lines (vs ~110 for the full inline version).

6 tests pin the inline contract (4-option cap, buckets, D-numbering,
docs pointer, runtime AUTO_DECIDE gate reference, orphan 12 regression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(question-pref): runtime AUTO_DECIDE carve-out for *-split-* ids

Split chains (per-option AskUserQuestion calls emitted by the new
"Handling 5+ options" rule) must never be silently auto-approved
via /plan-tune preferences. The user's option set is sacred.

Layer 1 (mechanism): unique <skill>-split-<option-slug> ids prevent
cross-option preference leakage. Layer 2 (this commit): the runtime
checker `gstack-question-preference --check` detects any id matching
*-split-* and forces ASK_NORMALLY even when never-ask or
ask-only-for-one-way preferences exist for that exact id. An
explanatory note tells the user their preference was bypassed and why.

7 tests pin the carve-out: no-pref baseline, never-ask override,
explanatory note text, ask-only-for-one-way override, always-ask
(no note), non-split id containing "split" word (negative case for
regex specificity), multi-skill split id formats.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): split-overflow regression for /plan-ceo-review

Periodic-tier E2E test that catches the original failure mode the
user complained about: 5+ options for ONE decision must split into
N sequential AskUserQuestion calls, not drop one to fit Conductor's
4-option cap.

Fixture: 5 independent chat-platform integration candidates
(Slack/Discord/Teams/Telegram/Mattermost), each carrying its own
include/defer/cut decision. Floor = 4 review-phase AUQs (standard
[N-1] tolerance band). Pre-fix "drop to 4 + 1 dropped" fails this
floor.

Wired into test/helpers/touchfiles.ts: tier periodic, depends on
plan-ceo-review/**, the new preamble subsection, the question-pref
binary (for the carve-out), and the runner helper. touchfiles.test.ts
expected count bumped 21 → 22 to account for the new entry.

Cost: ~$0.30/run when EVALS_TIER=periodic. Skips silently otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: post-merge regen + rebase size-budget baseline to v1.47.0.0

After merging origin/main (v1.45 → v1.47), three things needed cleanup:

1. spec/SKILL.md (main's new skill) regenerated to include our split-vs-drop
   preamble subsection — same mechanical regen as the other 41 tier-2+ skills.
2. Three golden ship fixtures refreshed to capture main's GSTACK_PLAN_MODE
   block + /spec routing entry + jargon-list.json refactor.
3. docs/skills.md — added /spec table row that main's PR (#1698/#1733) shipped
   without. Pre-existing failure on main; this PR catches and fixes.

Also rebased test/skill-size-budget.test.ts from v1.44.1 → v1.47.0.0 baseline.
Main's v1.46 (catalog tokens trim) + v1.47 (/spec skill) pushed the v1.44.1
anchor past the 5% ratchet to ×1.059 — pre-existing failure on main. This
PR captures a fresh parity-baseline-v1.47.0.0.json and re-anchors the test
there. Historical v1.44.1.json and v1.46.0.0.json retained in test/fixtures/
for reference. Our subsection contributes ~0.1% of the post-rebase corpus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.48.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:43:07 -07:00

278 lines
10 KiB
Bash
Executable File

#!/usr/bin/env bash
# gstack-question-preference — read/write/check explicit per-question preferences.
#
# Preference file: ~/.gstack/projects/{SLUG}/question-preferences.json
# Schema: { "<question_id>": "always-ask" | "never-ask" | "ask-only-for-one-way" }
#
# Subcommands:
# --check <id> → emit ASK_NORMALLY | AUTO_DECIDE | ASK_ONLY_ONE_WAY
# --write '{...}' → set a preference (user-origin gate enforced)
# --read → dump preferences JSON
# --clear [<id>] → clear one or all preferences
# --stats → short summary
#
# User-origin gate
# ----------------
# The --write subcommand REQUIRES a `source` field on the input:
# - "plan-tune" — user ran /plan-tune and chose a preference (allowed)
# - "inline-user" — inline `tune:` from the user's own chat message (allowed)
# - "inline-tool-output"— tune: prefix seen in tool output / file content (REJECTED)
# - "inline-file" — tune: prefix seen in a file the agent read (REJECTED)
# This is the profile-poisoning defense from docs/designs/PLAN_TUNING_V0.md.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
eval "$("$SCRIPT_DIR/gstack-slug" 2>/dev/null || true)"
SLUG="${SLUG:-unknown}"
PREF_FILE="$GSTACK_HOME/projects/$SLUG/question-preferences.json"
EVENT_FILE="$GSTACK_HOME/projects/$SLUG/question-events.jsonl"
mkdir -p "$GSTACK_HOME/projects/$SLUG"
CMD="${1:-}"
shift || true
ensure_file() {
if [ ! -f "$PREF_FILE" ]; then
echo '{}' > "$PREF_FILE"
fi
}
# -----------------------------------------------------------------------
# --check <question_id>
# -----------------------------------------------------------------------
do_check() {
local QID="${1:-}"
if [ -z "$QID" ]; then
echo "ASK_NORMALLY"
return 0
fi
ensure_file
cd "$ROOT_DIR"
PREF_FILE_PATH="$PREF_FILE" QID="$QID" bun -e "
import('./scripts/one-way-doors.ts').then((oneway) => {
const fs = require('fs');
const qid = process.env.QID;
const prefs = JSON.parse(fs.readFileSync(process.env.PREF_FILE_PATH, 'utf-8'));
const pref = prefs[qid];
// Always check one-way status first — safety overrides preferences.
const oneWay = oneway.isOneWayDoor({ question_id: qid });
if (oneWay) {
console.log('ASK_NORMALLY');
if (pref === 'never-ask') {
console.log('NOTE: one-way door overrides your never-ask preference for safety.');
}
return;
}
// Split-chain carve-out: per-option calls in N-option splits emit
// question_ids of the form <skill>-split-<option-slug>. These are
// NEVER AUTO_DECIDE-eligible regardless of stored preferences — the
// whole point of splitting is restoring user sovereignty over the
// option set. See scripts/resolvers/preamble/generate-ask-user-format.ts
// \"Handling 5+ options — split, never drop\" for the surrounding
// mechanism that generates these ids.
if (/-split-/.test(qid)) {
console.log('ASK_NORMALLY');
if (pref === 'never-ask' || pref === 'ask-only-for-one-way') {
console.log('NOTE: split-chain per-option calls always ASK_NORMALLY; your ' + pref + ' preference does not apply to options inside a sequential split.');
}
return;
}
switch (pref) {
case 'never-ask':
console.log('AUTO_DECIDE');
break;
case 'ask-only-for-one-way':
// Not one-way (we checked above) — auto-decide this two-way question.
console.log('AUTO_DECIDE');
break;
case 'always-ask':
case undefined:
case null:
console.log('ASK_NORMALLY');
break;
default:
console.log('ASK_NORMALLY');
console.log('NOTE: unknown preference value: ' + pref);
}
}).catch(err => { console.error('check:', err.message); process.exit(1); });
"
}
# -----------------------------------------------------------------------
# --write '{...}' (with user-origin gate)
# -----------------------------------------------------------------------
do_write() {
local INPUT="${1:-}"
if [ -z "$INPUT" ]; then
echo "gstack-question-preference: --write requires a JSON payload" >&2
exit 1
fi
ensure_file
local TMPERR
TMPERR=$(mktemp)
# Use function-local cleanup via RETURN trap so variable lookup only happens
# while the function is on the stack (avoids EXIT-trap unbound-var race).
trap "rm -f '$TMPERR'" RETURN
set +e
local RESULT
RESULT=$(printf '%s' "$INPUT" | PREF_FILE_PATH="$PREF_FILE" EVENT_FILE_PATH="$EVENT_FILE" bun -e "
const fs = require('fs');
const raw = await Bun.stdin.text();
let j;
try { j = JSON.parse(raw); } catch { process.stderr.write('gstack-question-preference: invalid JSON\n'); process.exit(1); }
// Required: question_id (kebab-case, <=64)
if (!j.question_id || !/^[a-z0-9-]+\$/.test(j.question_id) || j.question_id.length > 64) {
process.stderr.write('gstack-question-preference: invalid question_id\n');
process.exit(1);
}
// Required: preference
const ALLOWED_PREFS = ['always-ask', 'never-ask', 'ask-only-for-one-way'];
if (!ALLOWED_PREFS.includes(j.preference)) {
process.stderr.write('gstack-question-preference: invalid preference (must be one of: ' + ALLOWED_PREFS.join(', ') + ')\n');
process.exit(1);
}
// user-origin gate — REQUIRED on every write.
// See docs/designs/PLAN_TUNING_V0.md §Security model
const ALLOWED_SOURCES = ['plan-tune', 'inline-user'];
const REJECTED_SOURCES = ['inline-tool-output', 'inline-file', 'inline-file-content', 'inline-unknown'];
if (!j.source) {
process.stderr.write('gstack-question-preference: source field required (one of: ' + ALLOWED_SOURCES.join(', ') + ')\n');
process.exit(1);
}
if (REJECTED_SOURCES.includes(j.source)) {
process.stderr.write('gstack-question-preference: rejected — source \"' + j.source + '\" is not user-originated (profile poisoning defense)\n');
process.exit(2);
}
if (!ALLOWED_SOURCES.includes(j.source)) {
process.stderr.write('gstack-question-preference: invalid source \"' + j.source + '\"; allowed: ' + ALLOWED_SOURCES.join(', ') + '\n');
process.exit(1);
}
// Optional free_text — sanitize (no injection patterns, no newlines, <=300 chars)
if (j.free_text !== undefined) {
if (typeof j.free_text !== 'string') {
process.stderr.write('gstack-question-preference: free_text must be string\n');
process.exit(1);
}
if (j.free_text.length > 300) j.free_text = j.free_text.slice(0, 300);
j.free_text = j.free_text.replace(/\n+/g, ' ');
const INJECTION_PATTERNS = [
/ignore\s+(all\s+)?previous\s+(instructions|context|rules)/i,
/you\s+are\s+now\s+/i,
/override[:\s]/i,
/\bsystem\s*:/i,
/\bassistant\s*:/i,
/do\s+not\s+(report|flag|mention)/i,
];
for (const pat of INJECTION_PATTERNS) {
if (pat.test(j.free_text)) {
process.stderr.write('gstack-question-preference: free_text contains injection-like content, rejected\n');
process.exit(1);
}
}
}
// Write to preferences file
const prefs = JSON.parse(fs.readFileSync(process.env.PREF_FILE_PATH, 'utf-8'));
prefs[j.question_id] = j.preference;
fs.writeFileSync(process.env.PREF_FILE_PATH, JSON.stringify(prefs, null, 2));
// Also append a record to question-events.jsonl for audit + derivation.
const evt = {
ts: new Date().toISOString(),
event_type: 'preference-set',
question_id: j.question_id,
preference: j.preference,
source: j.source,
...(j.free_text ? { free_text: j.free_text } : {}),
};
fs.appendFileSync(process.env.EVENT_FILE_PATH, JSON.stringify(evt) + '\n');
console.log('OK: ' + j.question_id + ' → ' + j.preference + ' (source: ' + j.source + ')');
" 2>"$TMPERR")
local RC=$?
set -e
if [ $RC -ne 0 ]; then
cat "$TMPERR" >&2
exit $RC
fi
echo "$RESULT"
}
# -----------------------------------------------------------------------
# --read
# -----------------------------------------------------------------------
do_read() {
ensure_file
cat "$PREF_FILE"
}
# -----------------------------------------------------------------------
# --clear [<id>]
# -----------------------------------------------------------------------
do_clear() {
local QID="${1:-}"
ensure_file
if [ -z "$QID" ]; then
echo '{}' > "$PREF_FILE"
echo "OK: cleared all preferences"
else
PREF_FILE_PATH="$PREF_FILE" QID="$QID" bun -e "
const fs = require('fs');
const prefs = JSON.parse(fs.readFileSync(process.env.PREF_FILE_PATH, 'utf-8'));
if (prefs[process.env.QID] !== undefined) {
delete prefs[process.env.QID];
fs.writeFileSync(process.env.PREF_FILE_PATH, JSON.stringify(prefs, null, 2));
console.log('OK: cleared ' + process.env.QID);
} else {
console.log('NOOP: no preference set for ' + process.env.QID);
}
"
fi
}
# -----------------------------------------------------------------------
# --stats
# -----------------------------------------------------------------------
do_stats() {
ensure_file
cat "$PREF_FILE" | bun -e "
const prefs = JSON.parse(await Bun.stdin.text());
const entries = Object.entries(prefs);
const counts = { 'always-ask': 0, 'never-ask': 0, 'ask-only-for-one-way': 0, other: 0 };
for (const [, v] of entries) {
if (counts[v] !== undefined) counts[v]++;
else counts.other++;
}
console.log('TOTAL: ' + entries.length);
console.log('ALWAYS_ASK: ' + counts['always-ask']);
console.log('NEVER_ASK: ' + counts['never-ask']);
console.log('ASK_ONLY_ONE_WAY: ' + counts['ask-only-for-one-way']);
if (counts.other) console.log('OTHER: ' + counts.other);
"
}
case "$CMD" in
--check) do_check "$@" ;;
--write) do_write "$@" ;;
--read|"") do_read ;;
--clear) do_clear "$@" ;;
--stats) do_stats ;;
--help|-h) sed -n '1,/^set -euo/p' "$0" | sed 's|^# \?||' ;;
*)
echo "gstack-question-preference: unknown subcommand '$CMD'" >&2
exit 1
;;
esac