mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
22a4451e0e
* chore: regenerate stale ship golden fixtures
Golden fixtures were missing the VENDORED_GSTACK preamble section that
landed on main. Regression tests failed on all three hosts (claude, codex,
factory). Regenerated from current preamble output.
No code changes, unblocks test suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: anti-slop design constraints + delete duplicate constants
Tightens design-consultation and design-shotgun to push back on the
convergence traps every AI design tool falls into.
Changes:
- scripts/resolvers/constants.ts: add "system-ui as primary font" to
AI_SLOP_BLACKLIST. Document Space Grotesk as the new "safe alternative
to Inter" convergence trap alongside the existing overused fonts.
- scripts/gen-skill-docs.ts: delete duplicate AI slop constants block
(dead code — scripts/resolvers/constants.ts is the live source).
Prevents drift between the two definitions.
- design-consultation/SKILL.md.tmpl: add Space Grotesk + system-ui to
overused/slop lists. Add "anti-convergence directive" — vary across
generations in the same project. Add Phase 1 "memorable-thing forcing
question" (what's the one thing someone will remember?). Add Phase 5
"would a human designer be embarrassed by this?" self-gate before
presenting variants.
- design-shotgun/SKILL.md.tmpl: anti-convergence directive — each
variant must use a different font, palette, and layout. If two
variants look like siblings, one of them failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: context health soft directive in preamble (T2+)
Adds a "periodically self-summarize" nudge to long-running skills.
Soft directive only — no thresholds, no enforcement, no auto-commit.
Goal: self-awareness during /qa, /investigate, /cso etc. If you notice
yourself going in circles, STOP and reassess instead of thrashing.
Codex review caught that fake precision thresholds (15/30/45 tool calls)
were unimplementable — SKILL.md is a static prompt, not runtime code.
This ships the soft version only.
Changes:
- scripts/resolvers/preamble.ts: add generateContextHealth(), wire into
T2+ tier. Format: [PROGRESS] ... summary line. Explicit rule that
progress reporting must never mutate git state.
- All T2+ skill SKILL.md files regenerated to include the new section.
- Golden ship fixtures updated (T4 skill, picks up the change).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: model overlays with explicit --model flag (no auto-detect)
Adds a per-model behavioral patch layer orthogonal to the host axis.
Different LLMs have different tendencies (GPT won't stop, Gemini
over-explains, o-series wants structured output). Overlays nudge each
model toward better defaults for gstack workflows.
Codex review caught three landmines the prior reviews missed:
1. Host != model — Claude Code can run any Claude model, Codex runs
GPT/o-series, Cursor fronts multiple providers. Auto-detecting from
host would lie. Dropped auto-detect. --model is explicit (default
claude). Missing overlay file → empty string (graceful).
2. Import cycle — putting Model in resolvers/types.ts would cycle
through hosts/index. Created neutral scripts/models.ts instead.
3. "Final say" is dangerous — overlay at the end of preamble could
override STOP points, AskUserQuestion gates, /ship review gates.
Placed overlay after spawned-session-check but before voice + tier
sections. Wrapper heading adds explicit subordination language on
every overlay: "subordinate to skill workflow, STOP points,
AskUserQuestion gates, plan-mode safety, and /ship review gates."
Changes:
- scripts/models.ts: new neutral module. ALL_MODEL_NAMES, Model type,
resolveModel() for family heuristics (gpt-5.4-mini → gpt-5.4, o3 →
o-series, claude-opus-4-7 → claude), validateModel() helper.
- scripts/resolvers/types.ts: import Model, add ctx.model field.
- scripts/resolvers/model-overlay.ts: new resolver. Reads
model-overlays/{model}.md. Supports {{INHERIT:base}} directive at
top of file for concat (gpt-5.4 inherits gpt). Cycle guard.
- scripts/resolvers/index.ts: register MODEL_OVERLAY resolver.
- scripts/resolvers/preamble.ts: wire generateModelOverlay into
composition before voice. Print MODEL_OVERLAY: {model} in preamble
bash so users can see which overlay is active. Filter empty sections.
- scripts/gen-skill-docs.ts: parse --model CLI flag. Default claude.
Unknown model → throw with list of valid options.
- model-overlays/{claude,gpt,gpt-5.4,gemini,o-series}.md: behavioral
patches per model family. gpt-5.4.md uses {{INHERIT:gpt}} to extend
gpt.md without duplication.
- test/gen-skill-docs.test.ts: fix qa-only guardrail regex scope.
Was matching Edit/Glob/Grep anywhere after `allowed-tools:` in the
whole file. Now scoped to frontmatter only. Body prose (Claude
overlay references Edit as a tool) correctly no longer breaks it.
Verification:
- bun run gen:skill-docs --host all --dry-run → all fresh
- bun run gen:skill-docs --model gpt-5.4 → concat works, gpt.md +
gpt-5.4.md content appears in order
- bun run gen:skill-docs --model unknown → errors with valid list
- All generated skills contain MODEL_OVERLAY: claude in preamble
- Golden ship fixtures regenerated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: continuous checkpoint mode with non-destructive WIP squash
Adds opt-in auto-commit during long sessions so work survives Claude
Code crashes, Conductor workspace handoffs, and context switches.
Local-only by default — pushing requires explicit opt-in.
Codex review caught multiple landmines that would have shipped:
1. checkpoint_push=true default would push WIP commits to shared
branches, trigger CI/deploys, expose secrets. Now default false.
2. Plan's original /ship squash (git reset --soft to merge base) was
destructive — uncommitted ALL branch commits, not just WIP, and
caused non-fast-forward pushes. Redesigned: rebase --autosquash
scoped to WIP commits only, with explicit fallback for WIP-only
branches and STOP-and-ask for conflicts.
3. gstack-config get returned empty for missing keys with exit 0,
ignoring the annotated defaults in the header comments. Fixed:
get now falls back to a lookup_default() table that is the
canonical source for defaults.
4. Telemetry default mismatched: header said 'anonymous' but runtime
treated empty as 'off'. Aligned: default is 'off' everywhere.
5. /checkpoint resume only read markdown checkpoint files, not the
WIP commit [gstack-context] bodies the plan referenced. Wired up
parsing of [gstack-context] blocks from WIP commits as a second
recovery trail alongside the markdown checkpoints.
Changes:
- bin/gstack-config: add checkpoint_mode (default explicit) and
checkpoint_push (default false) to CONFIG_HEADER. Add lookup_default()
as canonical default source. get() falls back to defaults when key
absent. list now shows value + source (set/default). New 'defaults'
subcommand to inspect the table.
- scripts/resolvers/preamble.ts: preamble bash reads _CHECKPOINT_MODE
and _CHECKPOINT_PUSH, prints CHECKPOINT_MODE: and CHECKPOINT_PUSH: so
the mode is visible. New generateContinuousCheckpoint() section in
T2+ tier describes WIP commit format with [gstack-context] body and
the rules (never git add -A, never commit broken tests, push only
if opted in). Example deliberately shows a clean-state context so
it doesn't contradict the rules.
- ship/SKILL.md.tmpl: new Step 5.75 WIP Commit Squash. Detects WIP
count, exports [gstack-context] blocks before squash (as backup),
uses rebase --autosquash for mixed branches and soft-reset only when
VERIFIED WIP-only. Explicit anti-footgun rules against blind soft-
reset. Aborts with BLOCKED status on conflict instead of destroying
non-WIP commits.
- checkpoint/SKILL.md.tmpl: new Step 1.5 to parse [gstack-context]
blocks from WIP commits via git log --grep="^WIP:". Merges with
markdown checkpoint for fuller session recovery.
- Golden ship fixtures regenerated (ship is T4, preamble change shows up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: feature discovery flow gated by per-feature markers
Extends generateUpgradeCheck() to surface new features once per user
after a just-upgraded session. No more silent features.
Codex review caught: spawned sessions (OpenClaw, etc.) must skip the
discovery prompt entirely — they can't interactively answer. Feature
discovery now checks SPAWNED_SESSION first and is silent in those.
Discovery is per-feature, not per-upgrade. Each feature has its own
marker file at ~/.claude/skills/gstack/.feature-prompted-{name}. Once
the user has been shown a feature (accepted, shown docs, or skipped),
the marker is touched and the prompt never fires again for that
feature. Future features get their own markers.
V1 features surfaced:
- continuous-checkpoint: offer to enable checkpoint_mode=continuous
- model-overlay: inform-only note about --model flag and MODEL_OVERLAY
line in preamble output
Max one prompt per session to avoid nagging. Fires only on JUST_UPGRADED
(not every session), plus spawned-session skip.
Changes:
- scripts/resolvers/preamble.ts: extend generateUpgradeCheck() with
feature discovery rules, per-marker-file semantics, spawned-session
exclusion, and max-one-per-session cap.
- All skill SKILL.md files regenerated to include the new section.
- Golden ship fixtures regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: design taste engine with persistent schema
Adds a cross-session taste profile that learns from design-shotgun
approval/rejection decisions. Biases future design-consultation and
design-shotgun proposals toward the user's demonstrated preferences.
Codex review caught that the plan had "taste engine" as a vague goal
without schema, decay, migration, or placeholder insertion points. This
commit ships the full spec.
Schema v1 at ~/.gstack/projects/$SLUG/taste-profile.json:
- version, updated_at
- dimensions: fonts, colors, layouts, aesthetics — each with approved[]
and rejected[] preference lists
- sessions: last 50 (FIFO truncation), each with ts/action/variant/reason
- Preference: { value, confidence, approved_count, rejected_count, last_seen }
- Confidence: Laplace-smoothed approved/(total+1)
- Decay: 5% per week of inactivity, computed at read time (not write)
Changes:
- bin/gstack-taste-update: new CLI. Subcommands approved/rejected/show/
migrate. Parses reason string for dimension signals (e.g.,
"fonts: Geist; colors: slate; aesthetics: minimal"). Emits taste-drift
NOTE when a new signal contradicts a strong opposing signal. Legacy
approved.json aggregates migrate to v1 on next write.
- scripts/resolvers/design.ts: new generateTasteProfile() resolver.
Produces the prose that skills see: how to read the profile, how to
factor into proposals, conflict handling, schema migration.
- scripts/resolvers/index.ts: register TASTE_PROFILE and a BIN_DIR
resolver (returns ctx.paths.binDir, used by templates that shell out
to gstack-* binaries).
- design-consultation/SKILL.md.tmpl: insert {{TASTE_PROFILE}} placeholder
in Phase 1 right after the memorable-thing forcing question so the
Phase 3 proposal can factor in learned preferences.
- design-shotgun/SKILL.md.tmpl: taste memory section now reads
taste-profile.json via {{TASTE_PROFILE}}, falls back to per-session
approved.json (legacy). Approval flow documented to call
gstack-taste-update after user picks/rejects a variant.
Known gap: v1 extracts dimension signals from a reason string passed
by the caller ("fonts: X; colors: Y"). Future v2 can read EXIF or an
accompanying manifest written by design-shotgun alongside each variant
for automatic dimension extraction without needing the reason argument.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: multi-provider model benchmark (boil the ocean)
Adds the full spec Codex asked for: real provider adapters with auth
detection, normalized RunResult, pricing tables, tool compatibility
maps, parallel execution with error isolation, and table/JSON/markdown
output. Judge stays on Anthropic SDK as the single stable source of
quality scoring, gated behind --judge.
Codex flagged the original plan as massively under-scoped — the
existing runner is Claude-only and the judge is Anthropic-only. You
can't benchmark GPT or Gemini without real provider infrastructure.
This commit ships it.
New architecture:
test/helpers/providers/types.ts ProviderAdapter interface
test/helpers/providers/claude.ts wraps `claude -p --output-format json`
test/helpers/providers/gpt.ts wraps `codex exec --json`
test/helpers/providers/gemini.ts wraps `gemini -p --output-format stream-json --yolo`
test/helpers/pricing.ts per-model USD cost tables (quarterly)
test/helpers/tool-map.ts which tools each CLI exposes
test/helpers/benchmark-runner.ts orchestrator (Promise.allSettled)
test/helpers/benchmark-judge.ts Anthropic SDK quality scorer
bin/gstack-model-benchmark CLI entry
test/benchmark-runner.test.ts 9 unit tests (cost math, formatters, tool-map)
Per-provider error isolation:
- auth → record reason, don't abort batch
- timeout → record reason, don't abort batch
- rate_limit → record reason, don't abort batch
- binary_missing → record in available() check, skip if --skip-unavailable
Pricing correction: cached input tokens are disjoint from uncached
input tokens (Anthropic/OpenAI report them separately). Original
math subtracted them, producing negative costs. Now adds cached at
the 10% discount alongside the full uncached input cost.
CLI:
gstack-model-benchmark --prompt "..." --models claude,gpt,gemini
gstack-model-benchmark ./prompt.txt --output json --judge
gstack-model-benchmark ./prompt.txt --models claude --timeout-ms 60000
Output formats: table (default), json, markdown. Each shows model,
latency, in→out tokens, cost, quality (when --judge used), tool calls,
and any errors.
Known limitations for v1:
- Claude adapter approximates toolCalls as num_turns (stream-json
would give exact counts; v2 can upgrade).
- Live E2E tests (test/providers.e2e.test.ts) not included — they
require CI secrets for all three providers. Unit tests cover the
shape and math.
- Provider CLIs sometimes return non-JSON error text to stdout; the
parsers fall back to treating raw output as plain text in that case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: standalone methodology skill publishing via gstack-publish
Ships the marketplace-distribution half of Item 5 (reframed): publish
the existing standalone OpenClaw methodology skills to multiple
marketplaces with one command.
Codex review caught that the original plan assumed raw generated
multi-host skills could be published directly. They can't — those
depend on gstack binaries, generated host paths, tool names, and
telemetry. The correct artifact class is hand-crafted standalone
skills in openclaw/skills/gstack-openclaw-* (already exist and work
without gstack runtime). This commit adds the wrapper that publishes
them to ClawHub + SkillsMP + Vercel Skills.sh with per-marketplace
error isolation and dry-run validation.
Changes:
- skills.json: root manifest with 4 skills (office-hours, ceo-review,
investigate, retro) each pointing at its openclaw/skills source.
Each skill declares per-marketplace targets with a slug, a publish
flag, and a compatible-hosts list. Marketplace configs include CLI
name, login command, publish command template (with placeholder
substitution), docs URL, and auth_check command.
- bin/gstack-publish: new CLI. Subcommands:
gstack-publish Publish all skills
gstack-publish <slug> Publish one skill
gstack-publish --dry-run Validate + auth-check without publishing
gstack-publish --list List skills + marketplace targets
Features:
* Manifest validation (missing source files, missing slugs, empty
marketplace list all reported).
* Per-marketplace auth check before any publish attempt.
* Per-skill / per-marketplace error isolation: one failure doesn't
abort the batch.
* Idempotent — re-running with the same version is safe; markets
that reject duplicate versions report it as a failure for that
single target without affecting others.
* --dry-run walks the full pipeline but skips execSync; useful in
CI to validate manifest before bumping version.
Tested locally: clawhub auth detected, skillsmp/vercel CLIs not
installed (marked NOT READY and skipped cleanly in dry-run).
Follow-up work (tracked in TODOS.md later):
- Version-bump helper that reads openclaw/skills/*/SKILL.md frontmatter
and updates skills.json in lockstep.
- CI workflow that runs gstack-publish --dry-run on every PR and
gstack-publish on tags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: split preamble.ts into submodules (byte-identical output)
Splits scripts/resolvers/preamble.ts (841 lines, 18 generator functions +
composition root) into one file per generator under
scripts/resolvers/preamble/. Root preamble.ts becomes a thin composition
layer (~80 lines of imports + generatePreamble).
Before:
scripts/resolvers/preamble.ts 841 lines
After:
scripts/resolvers/preamble.ts 83 lines
scripts/resolvers/preamble/generate-preamble-bash.ts 97 lines
scripts/resolvers/preamble/generate-upgrade-check.ts 48 lines
scripts/resolvers/preamble/generate-lake-intro.ts 16 lines
scripts/resolvers/preamble/generate-telemetry-prompt.ts 37 lines
scripts/resolvers/preamble/generate-proactive-prompt.ts 25 lines
scripts/resolvers/preamble/generate-routing-injection.ts 49 lines
scripts/resolvers/preamble/generate-vendoring-deprecation.ts 36 lines
scripts/resolvers/preamble/generate-spawned-session-check.ts 11 lines
scripts/resolvers/preamble/generate-ask-user-format.ts 16 lines
scripts/resolvers/preamble/generate-completeness-section.ts 19 lines
scripts/resolvers/preamble/generate-repo-mode-section.ts 12 lines
scripts/resolvers/preamble/generate-test-failure-triage.ts 108 lines
scripts/resolvers/preamble/generate-search-before-building.ts 14 lines
scripts/resolvers/preamble/generate-completion-status.ts 161 lines
scripts/resolvers/preamble/generate-voice-directive.ts 60 lines
scripts/resolvers/preamble/generate-context-recovery.ts 51 lines
scripts/resolvers/preamble/generate-continuous-checkpoint.ts 48 lines
scripts/resolvers/preamble/generate-context-health.ts 31 lines
Byte-identity verification (the real gate per Codex correction):
- Before refactor: snapshotted 135 generated SKILL.md files via
`find -name SKILL.md -type f | grep -v /gstack/` across all hosts.
- After refactor: regenerated with `bun run gen:skill-docs --host all`
and re-snapshotted.
- `diff -r baseline after` returned zero differences and exit 0.
The `--host all --dry-run` gate passes too. No template or host behavior
changes — purely a code-organization refactor.
Test fix: audit-compliance.test.ts's telemetry check previously grepped
preamble.ts directly for `_TEL != "off"`. After the refactor that logic
lives in preamble/generate-preamble-bash.ts. Test now concatenates all
preamble submodule sources before asserting — tracks the semantic contract,
not the file layout. Doing the minimum rewrite preserves the test's intent
(conditional telemetry) without coupling it to file boundaries.
Why now: we were in-session with full context. Codex had downgraded this
from mandatory to optional, but the preamble had grown to 841 lines and
was getting harder to navigate. User asked "why not?" given the context
was hot. Shipping it as a clean bisectable commit while all the prior
preamble.ts changes are fresh reduces rebase pain later.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.19.0.0)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: trim verbose preamble + coverage audit prose
Compress without removing behavior or voice. Three targeted cuts:
1. scripts/resolvers/testing.ts coverage diagram example: 40 lines → 14
lines. Two-column ASCII layout instead of stacked sections.
Preserves all required regression-guard phrases (processPayment,
refundPayment, billing.test.ts, checkout.e2e.ts, COVERAGE, QUALITY,
GAPS, Code paths, User flows, ASCII coverage diagram).
2. scripts/resolvers/preamble/generate-completion-status.ts Plan Status
Footer: was 35 lines with embedded markdown table example, now 7
lines that describe the table inline. The footer fires only at
ExitPlanMode time — Claude can construct the placeholder table from
the inline description without copying a literal example.
3. Same file's Plan Mode Safe Operations + Skill Invocation During Plan
Mode sections compressed from ~25 lines combined to ~12. Preserves
all required test phrases (precedence over generic plan mode behavior,
Do not continue the workflow, cancel the skill or leave plan mode,
PLAN MODE EXCEPTION).
NOT touched:
- Voice directive (Garry's voice — protected per CLAUDE.md)
- Office-hours Phase 6 Handoff (Garry's voice + YC pitch)
- Test bootstrap, review army, plan completion (carefully tuned behavior)
Token savings (per skill, system-wide):
ship/SKILL.md 35474 → 34992 tokens (-482)
plan-ceo-review 29436 → 28940 (-496)
office-hours 26700 → 26204 (-496)
Still over the 25K ceiling. Bigger reduction requires restructure
(move large resolvers to externally-referenced docs, split /ship into
ship-quick + ship-full, or refactor the coverage audit + review army
into shorter prose). That's a follow-up — added to TODOS.
Tests: 420/420 pass on gen-skill-docs.test.ts + host-config.test.ts.
Goldens regenerated for claude/codex/factory ship.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): install Node.js from official tarball instead of NodeSource apt setup
The CI Dockerfile's Node install was failing on ubicloud runners. NodeSource's
setup_22.x script runs two internal apt operations that both depend on
archive.ubuntu.com + security.ubuntu.com being reachable:
1. apt-get update (to refresh package lists)
2. apt-get install gnupg (as a prerequisite for its gpg keyring)
Ubicloud's CI runners frequently can't reach those mirrors — last build hit
~2min of connection timeouts to every security.ubuntu.com IP (185.125.190.82,
91.189.91.83, 91.189.92.24, etc.) plus archive.ubuntu.com mirrors. Compounding
this: on Ubuntu 24.04 (noble) "gnupg" was renamed to "gpg" and "gpgconf".
NodeSource's setup script still looks for "gnupg", so even when apt works,
it fails with "Package 'gnupg' has no installation candidate." The subsequent
apt-get install nodejs then fails because the NodeSource repo was never added.
Fix: drop NodeSource entirely. Download Node.js v22.20.0 from nodejs.org as a
tarball, extract to /usr/local. One host, no apt, no script, no keyring.
Before:
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
&& apt-get install -y --no-install-recommends nodejs ...
After:
ENV NODE_VERSION=22.20.0
RUN curl -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-x64.tar.xz" -o /tmp/node.tar.xz \
&& tar -xJ -C /usr/local --strip-components=1 --no-same-owner -f /tmp/node.tar.xz \
&& rm -f /tmp/node.tar.xz \
&& node --version && npm --version
Same installed path (/usr/local/bin/node and npm). Pinned version for
reproducibility. Version is bump-visible in the Dockerfile now.
Does not address the separate apt flakiness that affects the GitHub CLI
install (line 17) or `npx playwright install-deps chromium` (line 33) —
those use apt too. If those fail on a future build we can address then.
Failing job: build-image (71777913820)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: raise skill token ceiling warning from 25K to 40K
The 25K ceiling predated flagship models with 200K-1M windows and assumed
every skill prompt dominates context cost. Modern reality: prompt caching
amortizes the skill load across invocations, and three carefully-tuned
skills (ship, plan-ceo-review, office-hours) legitimately pack 25-35K
tokens of behavior that can't be cut without degrading quality or removing
protected content (Garry's voice, YC pitch, specialist review instructions).
We made the safe prose cuts earlier (coverage diagram, plan status footer,
plan mode operations). The remaining gap is structural — real compression
would require splitting /ship into ship-quick vs ship-full, externalizing
large resolvers to reference docs, or removing detailed skill behavior.
Each is 1-2 days of work. The cost of the warning firing is zero (it's
a warning, not an error). The cost of hitting it is ~15¢ per invocation
at worst, amortized further by prompt caching.
Raising to 40K catches what it's supposed to catch — a runaway 10K+ token
growth in a single release — without crying wolf on legitimately big
skills. Reference doc in CLAUDE.md updated to reflect the new philosophy:
when you hit 40K, ask WHAT grew, don't blindly compress tuned prose.
scripts/gen-skill-docs.ts: TOKEN_CEILING_BYTES 100_000 → 160_000.
CLAUDE.md: document the "watch for feature bloat, not force compression"
intent of the ceiling.
Verification: `bun run gen:skill-docs --host all` shows zero TOKEN
CEILING warnings under the new 40K threshold.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci): install xz-utils so Node tarball extraction works
The direct-tarball Node install (switched from NodeSource apt in the last
CI fix) failed with "xz: Cannot exec: No such file or directory" because
Ubuntu 24.04 base doesn't include xz-utils. Node ships .tar.xz by default,
and `tar -xJ` shells out to xz, which was missing.
Add xz-utils to the base apt install alongside git/curl/unzip/etc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(benchmark): pass --skip-git-repo-check to codex adapter
The gpt provider adapter spawns `codex exec -C <workdir>` with arbitrary
working directories (benchmark temp dirs, non-git paths). Without
`--skip-git-repo-check`, codex refuses to run and returns "Not inside a
trusted directory" — surfaced as a generic error.code='unknown' that
looks like an API failure.
Benchmarks don't care about codex's git-repo trust model; we just want
the prompt executed. Surfaced by the new provider live E2E test on a
temp workdir.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(benchmark): add --dry-run flag to gstack-model-benchmark
Matches gstack-publish --dry-run semantics. Validates the provider list,
resolves per-adapter auth, echoes the resolved flag values, and exits
without invoking any provider CLI. Zero-cost pre-flight for CI pipelines
and for catching auth drift before starting a paid benchmark run.
Output shape:
== gstack-model-benchmark --dry-run ==
prompt: <truncated>
providers: claude, gpt, gemini
workdir: /tmp/...
timeout_ms: 300000
output: table
judge: off
Adapter availability:
claude: OK
gpt: NOT READY — <reason>
gemini: NOT READY — <reason>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: lite E2E coverage for benchmark, taste engine, publish
Fills real coverage gaps in v0.19.0.0 primitives. 44 new deterministic
tests (gate tier, ~3s) + 8 live-API tests (periodic tier).
New gate-tier test files (free, <3s total):
- test/taste-engine.test.ts — 24 tests against gstack-taste-update:
schema shape, Laplace-smoothed confidence, 5%/week decay clamped at 0,
multi-dimension extraction, case-insensitive matching, session cap,
legacy profile migration with session truncation, taste-drift conflict
warning, malformed-JSON recovery, missing-variant exit code.
- test/publish-dry-run.test.ts — 13 tests against gstack-publish --dry-run:
manifest parsing, missing/malformed JSON, per-skill validation errors
(missing source file / slug / version / marketplaces), slug filter,
unknown-skill exit, per-marketplace auth isolation (fake marketplaces
with always-pass / always-fail / missing-binary CLIs), and a sanity
check against the real repo manifest.
- test/benchmark-cli.test.ts — 11 tests against gstack-model-benchmark
--dry-run: provider default, unknown-provider WARN, empty list
fallback, flag passthrough (timeout/workdir/judge/output), long-prompt
truncation, prompt resolution (inline vs file vs positional), missing
prompt exit.
New periodic-tier test file (paid, gated EVALS=1):
- test/skill-e2e-benchmark-providers.test.ts — 8 tests hitting real
claude, codex, gemini CLIs with a trivial prompt (~$0.001/provider).
Verifies output parsing, token accounting, cost estimation, timeout
error.code semantics, Promise.allSettled parallel isolation.
Per-provider availability gate — unauthed providers skip cleanly.
This suite already caught one real bug (codex adapter missing
--skip-git-repo-check, fixed in 5260987d).
Registered `benchmark-providers-live` in touchfiles.ts (periodic tier,
triggered by changes to bin/gstack-model-benchmark, providers/**,
benchmark-runner.ts, pricing.ts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(benchmark): dedupe providers in --models
`--models claude,claude,gpt` previously produced a list with a duplicate
entry, meaning the benchmark would run claude twice and bill for two
runs. Surfaced by /review on this branch.
Use a Set internally; return Array.from(seen) to preserve type + order
of first occurrence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test: /review hardening — NOT-READY env isolation, workdir cleanup, perf
Applied from the adversarial subagent pass during /review on this branch:
- test/benchmark-cli.test.ts — new "NOT READY path fires when auth env
vars are stripped" test. The default dry-run test always showed OK on
dev machines with auth, hiding regressions in the remediation-hint
branch. Stripped env (no auth vars, HOME→empty tmpdir) now force-
exercises gpt + gemini NOT READY paths and asserts every NOT READY
line includes a concrete remediation hint (install/login/export).
(claude adapter's os.homedir() call is Bun-cached; the 2-of-3 adapter
coverage is sufficient to exercise the branch.)
- test/taste-engine.test.ts — session-cap test rewritten to seed the
profile with 50 entries + one real CLI call, instead of 55 sequential
subprocess spawns. Same coverage (FIFO eviction at the boundary), ~5s
faster CI time. Also pins first-casing-wins on the Geist/GEIST merge
assertion — bumpPref() keeps the first-arrival casing, so the test
documents that policy.
- test/skill-e2e-benchmark-providers.test.ts — workdir creation moved
from module-load into beforeAll, cleanup added in afterAll. Previous
shape leaked a /tmp/bench-e2e-* dir every CI run.
- test/publish-dry-run.test.ts — removed unused empty test/helpers
mkdirSync from the sandbox setup. The bin doesn't import from there,
so the empty dir was a footgun for future maintainers.
- test/helpers/providers/gpt.ts — expanded the inline comment on
`--skip-git-repo-check` to explicitly note that `-s read-only` is now
load-bearing safety (the trust prompt was the secondary boundary;
removing read-only while keeping skip-git-repo-check would be unsafe).
Net: 45 passing tests (was 44), session-cap test 5s faster, one real
regression surface covered that didn't exist before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: surface v0.19 binaries and continuous checkpoint in README
The /review doc-staleness check flagged that v0.19.0.0 ships three new CLIs
(gstack-model-benchmark, gstack-publish, gstack-taste-update) and an opt-in
continuous checkpoint mode, none of which were visible in README's Power
tools section. New users couldn't find them without reading CHANGELOG.
Added:
- "New binaries (v0.19)" subsection with one-row descriptions for each CLI
- "Continuous checkpoint mode (opt-in, local by default)" subsection
explaining WIP auto-commit + [gstack-context] body + /ship squash +
/checkpoint resume
CHANGELOG entry already has good voice from /ship; no polish needed.
VERSION already at 0.19.0.0. Other docs (ARCHITECTURE/CONTRIBUTING/BROWSER)
don't reference this surface — scoped intentionally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(ship): Step 19.5 — offer gstack-publish for methodology skill changes
Wires the orphaned gstack-publish binary into /ship. When a PR touches
any standalone methodology skill (openclaw/skills/gstack-*/SKILL.md) or
skills.json, /ship now runs gstack-publish --dry-run after PR creation
and asks the user if they want to actually publish.
Previously, the only way to discover gstack-publish was reading the
CHANGELOG or README. Most methodology skill updates landed on main
without ever being pushed to ClawHub / SkillsMP / Vercel Skills.sh,
defeating the whole point of having a marketplace publisher.
The check is conditional — for PRs that don't touch methodology skills
(the common case), this step is a silent no-op. Dry-run runs first so
the user sees the full list of what would publish and which marketplaces
are authed before committing.
Golden fixtures (claude/codex/factory) regenerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(benchmark-models): new skill wrapping gstack-model-benchmark
Wires the orphaned gstack-model-benchmark binary into a dedicated skill
so users can discover cross-model benchmarking via /benchmark-models or
voice triggers ("compare models", "which model is best").
Deliberately separate from /benchmark (page performance) because the
two surfaces test completely different things — confusing them would
muddy both.
Flow:
1. Pick a prompt (an existing SKILL.md file, inline text, or file path)
2. Confirm providers (dry-run shows auth status per provider)
3. Decide on --judge (adds ~$0.05, scores output quality 0-10)
4. Run the benchmark — table output
5. Interpret results (fastest / cheapest / highest quality)
6. Offer to save to ~/.gstack/benchmarks/<date>.json for trend tracking
Uses gstack-model-benchmark --dry-run as a safety gate — auth status is
visible BEFORE the user spends API calls. If zero providers are authed,
the skill stops cleanly rather than attempting a run that produces no
useful output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: v1.3.0.0 — complete CHANGELOG + bump for post-1.2 scope additions
VERSION 1.2.0.0 → 1.3.0.0. The original 1.2 entry was written before I
added substantial new scope: the /benchmark-models skill, /ship Step 19.5
gstack-publish integration, --dry-run on gstack-model-benchmark, and the
lite E2E test coverage (4 new test files). A minor bump gives those
changes their own version line instead of silently folding them into
1.2's scope.
CHANGELOG additions under 1.3.0.0:
- /benchmark-models skill (new Added)
- /ship Step 19.5 publish check (new Added)
- gstack-model-benchmark --dry-run (new Added)
- Token ceiling 25K → 40K (moved to Changed)
- New Fixed section — codex adapter --skip-git-repo-check, --models
dedupe, CI Dockerfile xz-utils + nodejs.org tarball
- 4 new test files documented under contributors (taste-engine,
publish-dry-run, benchmark-cli, skill-e2e-benchmark-providers)
- Ship golden fixtures for claude/codex/factory hosts
Pre-existing 1.2 content preserved verbatim — no entries clobbered or
reordered. Sequence remains contiguous (1.3.0.0 → 1.1.3.0 → 1.1.2.0 →
1.1.1.0 → 1.1.0.0 → 1.0.0.0 → 0.19.0.0 → ...).
package.json and VERSION both at 1.3.0.0. No drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: adopt gbrain's release-summary CHANGELOG format + apply to v1.3
Ported the "release-summary format" rules from ~/git/gbrain/CLAUDE.md
(lines 291-354) into gstack's CLAUDE.md under the existing
"CHANGELOG + VERSION style" section. Every future `## [X.Y.Z]` entry
now needs a verdict-style release summary at the top:
1. Two-line bold headline (10-14 words)
2. Lead paragraph (3-5 sentences)
3. "Numbers that matter" with BEFORE / AFTER / Δ table
4. "What this means for [audience]" closer
5. `### Itemized changes` header
6. Existing itemized subsections below
Rewrote v1.3.0.0 entry to match. Preserved every existing bullet in
Added / Changed / Fixed / For contributors (no content clobbered per
the CLAUDE.md CHANGELOG rule).
Numbers in the v1.3 release summary are verifiable — every row of the
BEFORE / AFTER table has a reproducible command listed in the setup
paragraph (git log, bun test, grep for wiring status). No made-up
metrics.
Also added the gbrain "always credit community contributions" rule to
the itemized-changes section. `Contributed by @username` for every
community PR that lands in a CHANGELOG entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: remove gstack-publish — no real user need
User feedback: "i don't think i would use gstack-publish, i think we
should remove it." Agreed. The CLI + marketplace wiring was an
ambitious but speculative primitive. Zero users, zero validated demand,
and the existing manual `clawhub publish` workflow already covers the
real case (OpenClaw methodology skill publishing).
Deleted:
- bin/gstack-publish (the CLI)
- skills.json (the marketplace manifest)
- test/publish-dry-run.test.ts (13 tests)
- ship/SKILL.md.tmpl Step 19.5 — the methodology-skill publish-on-ship
check. No target to dispatch to anymore.
- README.md Power tools row for gstack-publish
Updated:
- bin/gstack-model-benchmark doc comment: dropped "matches gstack-publish
--dry-run semantics" reference (self-describing flag now)
- CHANGELOG 1.3.0.0 entry:
* Release summary: "three new binaries" → "two new binaries".
Dropped the /ship publish-check narrative.
* Numbers table: "1 of 3 → 3 of 3 wired" → "1 of 2 → 2 of 2 wired".
Deterministic test count: 45 → 32 (removed publish-dry-run's 13).
* Added section: removed gstack-publish CLI bullet + /ship Step 19.5
bullet.
* "What this means for users" closer: replaced the /ship publish
paragraph with the design-taste-engine learning loop, which IS
real, wired, and something users hit every week via /design-shotgun.
* Contributors section: "Four new test files" → "Three new test files"
Retained:
- openclaw/skills/gstack-openclaw-* skill dirs (pre-existed this PR,
still publishable manually via `clawhub publish`, useful standalone
for ClawHub installs)
- CLAUDE.md publishing-native-skills section (same rationale)
Regenerated SKILL.md across all hosts. Ship golden fixtures refreshed
for claude/codex/factory. 455 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(CHANGELOG): reorder v1.3 entry around day-to-day user wins
Previous entry led with internal metrics (CLIs wired to skills, preamble
line count, adapter bugs caught in CI). Useful to contributors, invisible
to users. Rewrote the release summary and Added section to lead with
what a day-to-day gstack user actually experiences.
Release summary changes:
- Headline: "Every new CLI wired to a slash command" → "Your design
skills learn your taste. Your session state survives a laptop close."
- Lead paragraph: shifted from "primitives discoverable from /commands"
to concrete day-to-day wins (design-shotgun taste memory, design-
consultation anti-slop gates, continuous checkpoint survival).
- Numbers table: swapped internal metrics (CLI wiring %, test counts,
preamble line count) for user-visible ones:
- Design-variant convergence gate (0 → 3 axes required)
- AI-slop font blacklist (~8 → 10+ fonts)
- Taste memory across sessions (none → per-project JSON with decay)
- Session state after crash (lost → auto-WIP with structured body)
- /context-restore sources (markdown only → + WIP commits)
- Models with behavioral overlays (1 → 5)
- "Most striking" interpretation: reframed around the mid-session
crash survival story instead of the codex adapter bug catch.
- "What this means" closer: reframed around /design-shotgun + /design-
consultation + continuous checkpoint workflow instead of
/benchmark-models.
Added section — reorganized into six subsections by user value:
1. Design skills that stop looking like AI
(anti-slop constraints, taste engine)
2. Session state that survives a crash
(continuous checkpoint, /context-restore WIP reading,
/ship non-destructive squash)
3. Quality-of-life
(feature discovery prompt, context health soft directive)
4. Cross-host support
(--model flag + 5 overlays)
5. Config
(gstack-config list/defaults, checkpoint_mode/push keys)
6. Power-user / internal
(gstack-model-benchmark + /benchmark-models skill — grouped and
pushed to the bottom since it's more of a research tool than a
daily workflow piece)
Changed / Fixed / For contributors sections unchanged. No content
clobbered per CLAUDE.md CHANGELOG rules — every existing bullet is
preserved, just reordered and grouped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(CHANGELOG): reframe v1.3 entry around transparency vs laptop-close
User feedback: "'closing your laptop' in the changelog is overstated, i
mean claude code does already have session management. i think the use
of the context save restore is mainly just another tool that is more in
your control instead of opaque and a part of CC." Correct. CC handles
session persistence on its own; continuous checkpoint isn't filling a
gap there, it's giving users a parallel, inspectable, portable track.
Reframed every place the old copy overstated:
- Headline: "Your session state survives a laptop close" → "Your
session state lives in git, not a black box."
- Lead paragraph: dropped the "closing your laptop mid-refactor doesn't
vaporize your decisions" line. Now frames continuous checkpoint as
explicitly running alongside CC's built-in session management, not
replacing it. Emphasizes grep-ability, portability across tools and
branches.
- Numbers table row: "Session state after mid-refactor crash: lost
since last manual commit → auto-WIP commits" → "Session state
format: Claude Code's opaque session store → git commits +
[gstack-context] bodies + markdown (parallel track)". Honest about
what's actually changing.
- "Most striking" interpretation: replaced the "used to cost you every
decision" framing with the real user value — session state stops
being a black box, `git log --grep "WIP:"` shows the whole thread,
any tool reading git can see it.
- "What this means" closer: replaced "survives crashes, context
switches, and forgotten laptops" with accurate framing — parallel
track alongside CC's own, inspectable, portable, useful when you
want to review or hand off work.
- Added section: "Session state that survives a crash" subsection
renamed to "Session state you can see, grep, and move". Lead bullet
now explicitly notes continuous checkpoint runs alongside CC session
management, not instead.
No content clobbered. All other bullets and sections unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(CHANGELOG): correct session-state location — home dir by default, git only on opt-in
User correction: "wait is our session management really checked into
git? i don't think that's right, isn't it just saved in your home
dir?" Right. I had the location wrong. The default session-save
mechanism (`/context-save` + `/context-restore`) writes markdown
files to `~/.gstack/projects/$SLUG/checkpoints/` — HOME, not git.
Continuous checkpoint mode (opt-in) is what writes git commits.
Previous copy conflated the two and implied "lives in git" as the
default state, which is wrong.
Every affected location updated:
- Headline: "lives in git, not a black box" → "becomes files you
can grep, not a black box." Removes the false implication that
session state lands in git by default.
- Lead paragraph: now explicitly names the two separate mechanisms.
`/context-save` writes plaintext markdown to `~/.gstack/projects/
$SLUG/checkpoints/` (the default). Continuous checkpoint mode
(opt-in) additionally drops WIP: commits into the git log.
- Numbers table row: "Session state format" now reads "markdown in
`~/.gstack/` by default, plus WIP: git commits if you opt into
continuous mode (parallel track)." Tells the truth about which
path is default vs opt-in.
- "Most striking" row interpretation: now names both paths. Default
path = markdown files in home dir. Opt-in continuous mode = WIP:
commits in project git log. Either way, plain text the user owns.
- "What this means" closer: similarly names both paths explicitly.
"markdown files in your home directory by default, plus git
commits if you opt into continuous mode."
- Continuous checkpoint mode Added bullet: clarifies the commits
land in "your project's git log" (not implied to be the default),
and notes it runs alongside BOTH Claude Code's built-in session
management AND the default `/context-save` markdown flow.
No other bullets or sections touched. No content clobbered.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2764 lines
116 KiB
TypeScript
2764 lines
116 KiB
TypeScript
import { describe, test, expect } from 'bun:test';
|
|
import { COMMAND_DESCRIPTIONS } from '../browse/src/commands';
|
|
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '..');
|
|
const MAX_SKILL_DESCRIPTION_LENGTH = 1024;
|
|
|
|
function extractDescription(content: string): string {
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
expect(fmEnd).toBeGreaterThan(0);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
const lines = frontmatter.split('\n');
|
|
let description = '';
|
|
let inDescription = false;
|
|
const descLines: string[] = [];
|
|
|
|
for (const line of lines) {
|
|
if (line.match(/^description:\s*\|?\s*$/)) {
|
|
inDescription = true;
|
|
continue;
|
|
}
|
|
if (line.match(/^description:\s*\S/)) {
|
|
return line.replace(/^description:\s*/, '').trim();
|
|
}
|
|
if (inDescription) {
|
|
if (line === '' || line.match(/^\s/)) {
|
|
descLines.push(line.replace(/^ /, ''));
|
|
} else {
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
|
|
if (descLines.length > 0) {
|
|
description = descLines.join('\n').trim();
|
|
}
|
|
return description;
|
|
}
|
|
|
|
// Dynamic template discovery — matches the generator's findTemplates() behavior.
|
|
// New skills automatically get test coverage without updating a static list.
|
|
const ALL_SKILLS = (() => {
|
|
const skills: Array<{ dir: string; name: string }> = [];
|
|
if (fs.existsSync(path.join(ROOT, 'SKILL.md.tmpl'))) {
|
|
skills.push({ dir: '.', name: 'root gstack' });
|
|
}
|
|
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
|
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
|
if (fs.existsSync(path.join(ROOT, entry.name, 'SKILL.md.tmpl'))) {
|
|
skills.push({ dir: entry.name, name: entry.name });
|
|
}
|
|
}
|
|
return skills;
|
|
})();
|
|
|
|
describe('gen-skill-docs', () => {
|
|
test('generated SKILL.md contains all command categories', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
const categories = new Set(Object.values(COMMAND_DESCRIPTIONS).map(d => d.category));
|
|
for (const cat of categories) {
|
|
expect(content).toContain(`### ${cat}`);
|
|
}
|
|
});
|
|
|
|
test('generated SKILL.md contains all commands', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) {
|
|
const display = meta.usage || cmd;
|
|
expect(content).toContain(display);
|
|
}
|
|
});
|
|
|
|
test('command table is sorted alphabetically within categories', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
// Extract command names from the Navigation section as a test
|
|
const navSection = content.match(/### Navigation\n\|.*\n\|.*\n([\s\S]*?)(?=\n###|\n## )/);
|
|
expect(navSection).not.toBeNull();
|
|
const rows = navSection![1].trim().split('\n');
|
|
const commands = rows.map(r => {
|
|
const match = r.match(/\| `(\w+)/);
|
|
return match ? match[1] : '';
|
|
}).filter(Boolean);
|
|
const sorted = [...commands].sort();
|
|
expect(commands).toEqual(sorted);
|
|
});
|
|
|
|
test('generated header is present in SKILL.md', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('AUTO-GENERATED from SKILL.md.tmpl');
|
|
expect(content).toContain('Regenerate: bun run gen:skill-docs');
|
|
});
|
|
|
|
test('generated header is present in browse/SKILL.md', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('AUTO-GENERATED from SKILL.md.tmpl');
|
|
});
|
|
|
|
test('snapshot flags section contains all flags', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
for (const flag of SNAPSHOT_FLAGS) {
|
|
expect(content).toContain(flag.short);
|
|
expect(content).toContain(flag.description);
|
|
}
|
|
});
|
|
|
|
test('every skill has a SKILL.md.tmpl template', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const tmplPath = path.join(ROOT, skill.dir, 'SKILL.md.tmpl');
|
|
expect(fs.existsSync(tmplPath)).toBe(true);
|
|
}
|
|
});
|
|
|
|
test('every skill has a generated SKILL.md with auto-generated header', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const mdPath = path.join(ROOT, skill.dir, 'SKILL.md');
|
|
expect(fs.existsSync(mdPath)).toBe(true);
|
|
const content = fs.readFileSync(mdPath, 'utf-8');
|
|
expect(content).toContain('AUTO-GENERATED from SKILL.md.tmpl');
|
|
expect(content).toContain('Regenerate: bun run gen:skill-docs');
|
|
}
|
|
});
|
|
|
|
test('every generated SKILL.md has valid YAML frontmatter', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
expect(content.startsWith('---\n')).toBe(true);
|
|
expect(content).toContain('name:');
|
|
expect(content).toContain('description:');
|
|
}
|
|
});
|
|
|
|
test(`every generated SKILL.md description stays within ${MAX_SKILL_DESCRIPTION_LENGTH} chars`, () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
const description = extractDescription(content);
|
|
expect(description.length).toBeLessThanOrEqual(MAX_SKILL_DESCRIPTION_LENGTH);
|
|
}
|
|
});
|
|
|
|
test(`every Codex SKILL.md description stays within ${MAX_SKILL_DESCRIPTION_LENGTH} chars`, () => {
|
|
const agentsDir = path.join(ROOT, '.agents', 'skills');
|
|
if (!fs.existsSync(agentsDir)) return; // skip if not generated
|
|
for (const entry of fs.readdirSync(agentsDir, { withFileTypes: true })) {
|
|
if (!entry.isDirectory()) continue;
|
|
const skillMd = path.join(agentsDir, entry.name, 'SKILL.md');
|
|
if (!fs.existsSync(skillMd)) continue;
|
|
const content = fs.readFileSync(skillMd, 'utf-8');
|
|
const description = extractDescription(content);
|
|
expect(description.length).toBeLessThanOrEqual(MAX_SKILL_DESCRIPTION_LENGTH);
|
|
}
|
|
});
|
|
|
|
test('every Codex SKILL.md description stays under 900-char warning threshold', () => {
|
|
const WARN_THRESHOLD = 900;
|
|
const agentsDir = path.join(ROOT, '.agents', 'skills');
|
|
if (!fs.existsSync(agentsDir)) return;
|
|
const violations: string[] = [];
|
|
for (const entry of fs.readdirSync(agentsDir, { withFileTypes: true })) {
|
|
if (!entry.isDirectory()) continue;
|
|
const skillMd = path.join(agentsDir, entry.name, 'SKILL.md');
|
|
if (!fs.existsSync(skillMd)) continue;
|
|
const content = fs.readFileSync(skillMd, 'utf-8');
|
|
const description = extractDescription(content);
|
|
if (description.length > WARN_THRESHOLD) {
|
|
violations.push(`${entry.name}: ${description.length} chars (limit ${MAX_SKILL_DESCRIPTION_LENGTH}, ${MAX_SKILL_DESCRIPTION_LENGTH - description.length} remaining)`);
|
|
}
|
|
}
|
|
expect(violations).toEqual([]);
|
|
});
|
|
|
|
test('package.json version matches VERSION file', () => {
|
|
const pkg = JSON.parse(fs.readFileSync(path.join(ROOT, 'package.json'), 'utf-8'));
|
|
const version = fs.readFileSync(path.join(ROOT, 'VERSION'), 'utf-8').trim();
|
|
expect(pkg.version).toBe(version);
|
|
});
|
|
|
|
test('generated files are fresh (match --dry-run)', () => {
|
|
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--dry-run'], {
|
|
cwd: ROOT,
|
|
stdout: 'pipe',
|
|
stderr: 'pipe',
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const output = result.stdout.toString();
|
|
// Every skill should be FRESH
|
|
for (const skill of ALL_SKILLS) {
|
|
const file = skill.dir === '.' ? 'SKILL.md' : `${skill.dir}/SKILL.md`;
|
|
expect(output).toContain(`FRESH: ${file}`);
|
|
}
|
|
expect(output).not.toContain('STALE');
|
|
});
|
|
|
|
test('no generated SKILL.md contains unresolved placeholders', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
const unresolved = content.match(/\{\{[A-Z_]+\}\}/g);
|
|
expect(unresolved).toBeNull();
|
|
}
|
|
});
|
|
|
|
test('templates contain placeholders', () => {
|
|
const rootTmpl = fs.readFileSync(path.join(ROOT, 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(rootTmpl).toContain('{{COMMAND_REFERENCE}}');
|
|
expect(rootTmpl).toContain('{{SNAPSHOT_FLAGS}}');
|
|
expect(rootTmpl).toContain('{{PREAMBLE}}');
|
|
|
|
const browseTmpl = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(browseTmpl).toContain('{{COMMAND_REFERENCE}}');
|
|
expect(browseTmpl).toContain('{{SNAPSHOT_FLAGS}}');
|
|
expect(browseTmpl).toContain('{{PREAMBLE}}');
|
|
});
|
|
|
|
test('generated SKILL.md contains operational self-improvement (replaced contributor mode)', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('Contributor Mode');
|
|
expect(content).not.toContain('gstack_contributor');
|
|
expect(content).not.toContain('contributor-logs');
|
|
expect(content).toContain('Operational Self-Improvement');
|
|
expect(content).toContain('gstack-learnings-log');
|
|
expect(content).toContain('gstack-learnings-search --limit 3');
|
|
});
|
|
|
|
test('generated SKILL.md with LEARNINGS_LOG contains operational type', () => {
|
|
// Check a skill that has LEARNINGS_LOG (e.g., review)
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('operational');
|
|
});
|
|
|
|
test('generated SKILL.md contains session awareness', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('_SESSIONS');
|
|
expect(content).toContain('RECOMMENDATION');
|
|
});
|
|
|
|
test('generated SKILL.md contains branch detection', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('_BRANCH');
|
|
expect(content).toContain('git branch --show-current');
|
|
});
|
|
|
|
test('tier 2+ skills contain ELI16 simplification rules (AskUserQuestion format)', () => {
|
|
// Root SKILL.md is tier 1 (no AskUserQuestion format). Check a tier 2+ skill instead.
|
|
const content = fs.readFileSync(path.join(ROOT, 'cso', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('No raw function names');
|
|
expect(content).toContain('plain English');
|
|
});
|
|
|
|
test('tier 1 skills do NOT contain AskUserQuestion format', () => {
|
|
// Use benchmark (tier 1) instead of root — root SKILL.md gets overwritten by Codex test setup
|
|
const content = fs.readFileSync(path.join(ROOT, 'benchmark', 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('## AskUserQuestion Format');
|
|
expect(content).not.toContain('## Completeness Principle');
|
|
});
|
|
|
|
test('generated SKILL.md contains telemetry line', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('skill-usage.jsonl');
|
|
expect(content).toContain('~/.gstack/analytics');
|
|
});
|
|
|
|
test('preamble .pending-* glob is zsh-safe (uses find, not shell glob)', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
if (!content.includes('.pending-')) continue;
|
|
// Must NOT have a bare shell glob ".pending-*" outside of find's -name argument
|
|
expect(content).not.toMatch(/for _PF in [^\n]*\/\.pending-\*/);
|
|
// Must use find to avoid zsh NOMATCH error on glob expansion
|
|
expect(content).toContain("find ~/.gstack/analytics -maxdepth 1 -name '.pending-*'");
|
|
}
|
|
});
|
|
|
|
test('bash blocks with shell globs are zsh-safe (setopt guard or find)', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
const bashBlocks = [...content.matchAll(/```bash\n([\s\S]*?)```/g)].map(m => m[1]);
|
|
|
|
for (const block of bashBlocks) {
|
|
const lines = block.split('\n');
|
|
|
|
for (const line of lines) {
|
|
const trimmed = line.trimStart();
|
|
if (trimmed.startsWith('#')) continue;
|
|
if (!trimmed.includes('*')) continue;
|
|
// Skip lines where * is inside find -name, git pathspecs, or $(find)
|
|
if (/\bfind\b/.test(trimmed)) continue;
|
|
if (/\bgit\b/.test(trimmed)) continue;
|
|
if (/\$\(find\b/.test(trimmed)) continue;
|
|
|
|
// Check 1: "for VAR in <glob>" must use $(find ...) — caught above by the
|
|
// $(find check, so any surviving for-in with a glob pattern is a violation
|
|
if (/\bfor\s+\w+\s+in\b/.test(trimmed) && /\*\./.test(trimmed)) {
|
|
throw new Error(
|
|
`Unsafe for-in glob in ${skill.dir}/SKILL.md: "${trimmed}". ` +
|
|
`Use \`for f in $(find ... -name '*.ext')\` for zsh compatibility.`
|
|
);
|
|
}
|
|
|
|
// Check 2: ls/cat/rm/grep with glob file args must have setopt guard
|
|
const isGlobCmd = /\b(?:ls|cat|rm|grep)\b/.test(trimmed) &&
|
|
/(?:\/\*[a-z.*]|\*\.[a-z])/.test(trimmed);
|
|
if (isGlobCmd) {
|
|
expect(block).toContain('setopt +o nomatch');
|
|
}
|
|
}
|
|
}
|
|
}
|
|
});
|
|
|
|
test('preamble-using skills have correct skill name in telemetry', () => {
|
|
const PREAMBLE_SKILLS = [
|
|
{ dir: '.', name: 'gstack' },
|
|
{ dir: 'ship', name: 'ship' },
|
|
{ dir: 'review', name: 'review' },
|
|
{ dir: 'qa', name: 'qa' },
|
|
{ dir: 'retro', name: 'retro' },
|
|
];
|
|
for (const skill of PREAMBLE_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain(`"skill":"${skill.name}"`);
|
|
}
|
|
});
|
|
|
|
test('qa and qa-only templates use QA_METHODOLOGY placeholder', () => {
|
|
const qaTmpl = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(qaTmpl).toContain('{{QA_METHODOLOGY}}');
|
|
|
|
const qaOnlyTmpl = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(qaOnlyTmpl).toContain('{{QA_METHODOLOGY}}');
|
|
});
|
|
|
|
test('QA_METHODOLOGY appears expanded in both qa and qa-only generated files', () => {
|
|
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
|
const qaOnlyContent = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8');
|
|
|
|
// Both should contain the health score rubric
|
|
expect(qaContent).toContain('Health Score Rubric');
|
|
expect(qaOnlyContent).toContain('Health Score Rubric');
|
|
|
|
// Both should contain framework guidance
|
|
expect(qaContent).toContain('Framework-Specific Guidance');
|
|
expect(qaOnlyContent).toContain('Framework-Specific Guidance');
|
|
|
|
// Both should contain the important rules
|
|
expect(qaContent).toContain('Important Rules');
|
|
expect(qaOnlyContent).toContain('Important Rules');
|
|
|
|
// Both should contain the 6 phases
|
|
expect(qaContent).toContain('Phase 1');
|
|
expect(qaOnlyContent).toContain('Phase 1');
|
|
expect(qaContent).toContain('Phase 6');
|
|
expect(qaOnlyContent).toContain('Phase 6');
|
|
});
|
|
|
|
test('qa-only has no-fix guardrails', () => {
|
|
const qaOnlyContent = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8');
|
|
expect(qaOnlyContent).toContain('Never fix bugs');
|
|
expect(qaOnlyContent).toContain('NEVER fix anything');
|
|
// Should not have Edit, Glob, or Grep in allowed-tools.
|
|
// Scope to frontmatter (between the first two --- lines) — the body can
|
|
// legitimately mention these tool names in prose (e.g., Claude model
|
|
// overlay says "prefer Read, Edit, Write, Glob, Grep over Bash").
|
|
const fmMatch = qaOnlyContent.match(/^---\n([\s\S]*?)\n---/);
|
|
expect(fmMatch).not.toBeNull();
|
|
const frontmatter = fmMatch![1];
|
|
expect(frontmatter).toMatch(/allowed-tools:/);
|
|
expect(frontmatter).not.toMatch(/allowed-tools:[\s\S]*?- Edit/);
|
|
expect(frontmatter).not.toMatch(/allowed-tools:[\s\S]*?- Glob/);
|
|
expect(frontmatter).not.toMatch(/allowed-tools:[\s\S]*?- Grep/);
|
|
});
|
|
|
|
test('qa has fix-loop tools and phases', () => {
|
|
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
|
// Should have Edit, Glob, Grep in allowed-tools
|
|
expect(qaContent).toContain('Edit');
|
|
expect(qaContent).toContain('Glob');
|
|
expect(qaContent).toContain('Grep');
|
|
// Should have fix-loop phases
|
|
expect(qaContent).toContain('Phase 7');
|
|
expect(qaContent).toContain('Phase 8');
|
|
expect(qaContent).toContain('Fix Loop');
|
|
expect(qaContent).toContain('Triage');
|
|
expect(qaContent).toContain('WTF');
|
|
});
|
|
});
|
|
|
|
describe('BASE_BRANCH_DETECT resolver', () => {
|
|
// Find a generated SKILL.md that uses the placeholder (ship is guaranteed to)
|
|
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('resolver output contains PR base detection command', () => {
|
|
expect(shipContent).toContain('gh pr view --json baseRefName');
|
|
});
|
|
|
|
test('resolver output contains repo default branch detection command', () => {
|
|
expect(shipContent).toContain('gh repo view --json defaultBranchRef');
|
|
});
|
|
|
|
test('resolver output contains fallback to main', () => {
|
|
expect(shipContent).toMatch(/fall\s*back\s+to\s+`main`/i);
|
|
});
|
|
|
|
test('resolver output uses "the base branch" phrasing', () => {
|
|
expect(shipContent).toContain('the base branch');
|
|
});
|
|
|
|
test('resolver output contains GitLab CLI commands', () => {
|
|
expect(shipContent).toContain('glab');
|
|
});
|
|
|
|
test('resolver output contains git-native fallback', () => {
|
|
expect(shipContent).toContain('git symbolic-ref');
|
|
});
|
|
|
|
test('resolver output mentions GitLab platform', () => {
|
|
expect(shipContent).toMatch(/gitlab/i);
|
|
});
|
|
});
|
|
|
|
describe('GitLab support in generated skills', () => {
|
|
const retroContent = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
|
const shipSkillContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('retro contains GitLab MR number extraction', () => {
|
|
expect(retroContent).toContain('[#!]');
|
|
});
|
|
|
|
test('retro uses BASE_BRANCH_DETECT (contains glab)', () => {
|
|
expect(retroContent).toContain('glab');
|
|
});
|
|
|
|
test('ship contains glab mr create', () => {
|
|
expect(shipSkillContent).toContain('glab mr create');
|
|
});
|
|
|
|
test('ship checks .gitlab-ci.yml', () => {
|
|
expect(shipSkillContent).toContain('.gitlab-ci.yml');
|
|
});
|
|
});
|
|
|
|
/**
|
|
* Quality evals — catch description regressions.
|
|
*
|
|
* These test that generated output is *useful for an AI agent*,
|
|
* not just structurally valid. Each test targets a specific
|
|
* regression we actually shipped and caught in review.
|
|
*/
|
|
describe('description quality evals', () => {
|
|
// Regression: snapshot flags lost value hints (-d <N>, -s <sel>, -o <path>)
|
|
test('snapshot flags with values include value hints in output', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
for (const flag of SNAPSHOT_FLAGS) {
|
|
if (flag.takesValue) {
|
|
expect(flag.valueHint).toBeDefined();
|
|
expect(content).toContain(`${flag.short} ${flag.valueHint}`);
|
|
}
|
|
}
|
|
});
|
|
|
|
// Regression: "is" lost the valid states enum
|
|
test('is command lists valid state values', () => {
|
|
const desc = COMMAND_DESCRIPTIONS['is'].description;
|
|
for (const state of ['visible', 'hidden', 'enabled', 'disabled', 'checked', 'editable', 'focused']) {
|
|
expect(desc).toContain(state);
|
|
}
|
|
});
|
|
|
|
// Regression: "press" lost common key examples
|
|
test('press command lists example keys', () => {
|
|
const desc = COMMAND_DESCRIPTIONS['press'].description;
|
|
expect(desc).toContain('Enter');
|
|
expect(desc).toContain('Tab');
|
|
expect(desc).toContain('Escape');
|
|
});
|
|
|
|
// Regression: "console" lost --errors filter note
|
|
test('console command describes --errors behavior', () => {
|
|
const desc = COMMAND_DESCRIPTIONS['console'].description;
|
|
expect(desc).toContain('--errors');
|
|
});
|
|
|
|
// Regression: snapshot -i lost "@e refs" context
|
|
test('snapshot -i mentions @e refs', () => {
|
|
const flag = SNAPSHOT_FLAGS.find(f => f.short === '-i')!;
|
|
expect(flag.description).toContain('@e');
|
|
});
|
|
|
|
// Regression: snapshot -C lost "@c refs" context
|
|
test('snapshot -C mentions @c refs', () => {
|
|
const flag = SNAPSHOT_FLAGS.find(f => f.short === '-C')!;
|
|
expect(flag.description).toContain('@c');
|
|
});
|
|
|
|
// Guard: every description must be at least 8 chars (catches empty or stub descriptions)
|
|
test('all command descriptions have meaningful length', () => {
|
|
for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) {
|
|
expect(meta.description.length).toBeGreaterThanOrEqual(8);
|
|
}
|
|
});
|
|
|
|
// Guard: snapshot flag descriptions must be at least 10 chars
|
|
test('all snapshot flag descriptions have meaningful length', () => {
|
|
for (const flag of SNAPSHOT_FLAGS) {
|
|
expect(flag.description.length).toBeGreaterThanOrEqual(10);
|
|
}
|
|
});
|
|
|
|
// Guard: descriptions must not contain pipe (breaks markdown table cells)
|
|
// Usage strings are backtick-wrapped in the table so pipes there are safe.
|
|
test('no command description contains pipe character', () => {
|
|
for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) {
|
|
expect(meta.description).not.toContain('|');
|
|
}
|
|
});
|
|
|
|
// Guard: generated output uses → not ->
|
|
test('generated SKILL.md uses unicode arrows', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
// Check the Tips section specifically (where we regressed -> from →)
|
|
const tipsSection = content.slice(content.indexOf('## Tips'));
|
|
expect(tipsSection).toContain('→');
|
|
expect(tipsSection).not.toContain('->');
|
|
});
|
|
});
|
|
|
|
describe('REVIEW_DASHBOARD resolver', () => {
|
|
const REVIEW_SKILLS = ['plan-ceo-review', 'plan-eng-review', 'plan-design-review'];
|
|
|
|
for (const skill of REVIEW_SKILLS) {
|
|
test(`review dashboard appears in ${skill} generated file`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('gstack-review');
|
|
expect(content).toContain('REVIEW READINESS DASHBOARD');
|
|
});
|
|
}
|
|
|
|
test('review dashboard appears in ship generated file', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('reviews.jsonl');
|
|
expect(content).toContain('REVIEW READINESS DASHBOARD');
|
|
});
|
|
|
|
test('dashboard treats review as a valid Eng Review source', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('plan-eng-review, review, plan-design-review');
|
|
expect(content).toContain('`review` (diff-scoped pre-landing review)');
|
|
expect(content).toContain('`plan-eng-review` (plan-stage architecture review)');
|
|
expect(content).toContain('from either \\`review\\` or \\`plan-eng-review\\`');
|
|
});
|
|
|
|
test('shared dashboard propagates review source to plan-eng-review', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('plan-eng-review, review, plan-design-review');
|
|
expect(content).toContain('`review` (diff-scoped pre-landing review)');
|
|
});
|
|
|
|
test('resolver output contains key dashboard elements', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('VERDICT');
|
|
expect(content).toContain('CLEARED');
|
|
expect(content).toContain('Eng Review');
|
|
expect(content).toContain('7 days');
|
|
expect(content).toContain('Design Review');
|
|
expect(content).toContain('skip_eng_review');
|
|
});
|
|
|
|
test('dashboard bash block includes git HEAD for staleness detection', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('git rev-parse --short HEAD');
|
|
expect(content).toContain('---HEAD---');
|
|
});
|
|
|
|
test('dashboard includes staleness detection prose', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Staleness detection');
|
|
expect(content).toContain('commit');
|
|
});
|
|
|
|
for (const skill of REVIEW_SKILLS) {
|
|
test(`${skill} contains review chaining section`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Review Chaining');
|
|
});
|
|
|
|
test(`${skill} Review Log includes commit field`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('"commit"');
|
|
});
|
|
}
|
|
|
|
test('plan-ceo-review chaining mentions eng and design reviews', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('/plan-eng-review');
|
|
expect(content).toContain('/plan-design-review');
|
|
});
|
|
|
|
test('plan-eng-review chaining mentions design and ceo reviews', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('/plan-design-review');
|
|
expect(content).toContain('/plan-ceo-review');
|
|
});
|
|
|
|
test('plan-design-review chaining mentions eng, ceo, and design skills', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('/plan-eng-review');
|
|
expect(content).toContain('/plan-ceo-review');
|
|
expect(content).toContain('/design-shotgun');
|
|
expect(content).toContain('/design-html');
|
|
});
|
|
|
|
test('ship does NOT contain review chaining', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('Review Chaining');
|
|
});
|
|
});
|
|
|
|
// ─── Test Coverage Audit Resolver Tests ─────────────────────
|
|
|
|
describe('TEST_COVERAGE_AUDIT placeholders', () => {
|
|
const planSkill = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('plan and ship modes share codepath tracing methodology', () => {
|
|
// Review mode delegates test coverage to the Testing specialist subagent (Review Army)
|
|
const sharedPhrases = [
|
|
'Trace data flow',
|
|
'Diagram the execution',
|
|
'Quality scoring rubric',
|
|
'★★★',
|
|
'★★',
|
|
'GAP',
|
|
];
|
|
for (const phrase of sharedPhrases) {
|
|
expect(planSkill).toContain(phrase);
|
|
expect(shipSkill).toContain(phrase);
|
|
}
|
|
// Plan mode traces the plan, not a git diff
|
|
expect(planSkill).toContain('Trace every codepath in the plan');
|
|
expect(planSkill).not.toContain('git diff origin');
|
|
// Ship mode traces the diff
|
|
expect(shipSkill).toContain('Trace every codepath changed');
|
|
});
|
|
|
|
test('review mode uses Review Army for specialist dispatch', () => {
|
|
expect(reviewSkill).toContain('Review Army');
|
|
expect(reviewSkill).toContain('Specialist Dispatch');
|
|
expect(reviewSkill).toContain('testing.md');
|
|
});
|
|
|
|
test('plan and ship modes include E2E decision matrix', () => {
|
|
// Review mode delegates to Testing specialist
|
|
for (const skill of [planSkill, shipSkill]) {
|
|
expect(skill).toContain('E2E Test Decision Matrix');
|
|
expect(skill).toContain('→E2E');
|
|
expect(skill).toContain('→EVAL');
|
|
}
|
|
});
|
|
|
|
test('plan and ship modes include regression rule', () => {
|
|
// Review mode delegates to Testing specialist
|
|
for (const skill of [planSkill, shipSkill]) {
|
|
expect(skill).toContain('REGRESSION RULE');
|
|
expect(skill).toContain('IRON RULE');
|
|
}
|
|
});
|
|
|
|
test('plan and ship modes include test framework detection', () => {
|
|
// Review mode delegates to Testing specialist
|
|
for (const skill of [planSkill, shipSkill]) {
|
|
expect(skill).toContain('Test Framework Detection');
|
|
expect(skill).toContain('CLAUDE.md');
|
|
}
|
|
});
|
|
|
|
test('plan mode adds tests to plan + includes test plan artifact', () => {
|
|
expect(planSkill).toContain('Add missing tests to the plan');
|
|
expect(planSkill).toContain('eng-review-test-plan');
|
|
expect(planSkill).toContain('Test Plan Artifact');
|
|
});
|
|
|
|
test('ship mode auto-generates tests + includes before/after count', () => {
|
|
expect(shipSkill).toContain('Generate tests for uncovered paths');
|
|
expect(shipSkill).toContain('Before/after test count');
|
|
expect(shipSkill).toContain('30 code paths max');
|
|
expect(shipSkill).toContain('ship-test-plan');
|
|
});
|
|
|
|
test('review mode uses Fix-First + Review Army for specialist coverage', () => {
|
|
expect(reviewSkill).toContain('Fix-First');
|
|
expect(reviewSkill).toContain('INFORMATIONAL');
|
|
// Review Army handles test coverage via Testing specialist subagent
|
|
expect(reviewSkill).toContain('Review Army');
|
|
expect(reviewSkill).toContain('Testing');
|
|
});
|
|
|
|
test('plan mode does NOT include ship-specific content', () => {
|
|
expect(planSkill).not.toContain('Before/after test count');
|
|
expect(planSkill).not.toContain('30 code paths max');
|
|
expect(planSkill).not.toContain('ship-test-plan');
|
|
});
|
|
|
|
test('review mode does NOT include test plan artifact', () => {
|
|
expect(reviewSkill).not.toContain('Test Plan Artifact');
|
|
expect(reviewSkill).not.toContain('eng-review-test-plan');
|
|
expect(reviewSkill).not.toContain('ship-test-plan');
|
|
});
|
|
|
|
test('review/specialists/ directory has all expected checklist files', () => {
|
|
const specDir = path.join(ROOT, 'review', 'specialists');
|
|
const expected = [
|
|
'testing.md',
|
|
'maintainability.md',
|
|
'security.md',
|
|
'performance.md',
|
|
'data-migration.md',
|
|
'api-contract.md',
|
|
'red-team.md',
|
|
];
|
|
for (const f of expected) {
|
|
expect(fs.existsSync(path.join(specDir, f))).toBe(true);
|
|
}
|
|
});
|
|
|
|
test('each specialist file has standard header with scope and output format', () => {
|
|
const specDir = path.join(ROOT, 'review', 'specialists');
|
|
const files = fs.readdirSync(specDir).filter(f => f.endsWith('.md'));
|
|
for (const f of files) {
|
|
const content = fs.readFileSync(path.join(specDir, f), 'utf-8');
|
|
// All specialist files must have Scope and Output/JSON in header
|
|
expect(content).toContain('Scope:');
|
|
expect(content.toLowerCase()).toMatch(/output|json/);
|
|
// Must define NO FINDINGS behavior
|
|
expect(content).toContain('NO FINDINGS');
|
|
}
|
|
});
|
|
|
|
// Regression guard: ship output contains key phrases from before the refactor
|
|
test('ship SKILL.md regression guard — key phrases preserved', () => {
|
|
const regressionPhrases = [
|
|
'100% coverage is the goal',
|
|
'ASCII coverage diagram',
|
|
'processPayment',
|
|
'refundPayment',
|
|
'billing.test.ts',
|
|
'checkout.e2e.ts',
|
|
'COVERAGE:',
|
|
'QUALITY:',
|
|
'GAPS:',
|
|
'Code paths:',
|
|
'User flows:',
|
|
];
|
|
for (const phrase of regressionPhrases) {
|
|
expect(shipSkill).toContain(phrase);
|
|
}
|
|
});
|
|
|
|
test('ship SKILL.md contains review army specialist dispatch', () => {
|
|
expect(shipSkill).toContain('Specialist Dispatch');
|
|
expect(shipSkill).toContain('Step 9.1');
|
|
expect(shipSkill).toContain('Step 9.2');
|
|
});
|
|
|
|
test('ship SKILL.md contains cross-review finding dedup', () => {
|
|
expect(shipSkill).toContain('Cross-review finding dedup');
|
|
expect(shipSkill).toContain('Step 9.3');
|
|
});
|
|
|
|
test('ship SKILL.md contains re-run idempotency behavior', () => {
|
|
expect(shipSkill).toContain('Re-run behavior (idempotency)');
|
|
expect(shipSkill).toContain('Never skip a verification step');
|
|
});
|
|
});
|
|
|
|
// --- {{TEST_FAILURE_TRIAGE}} resolver tests ---
|
|
|
|
describe('TEST_FAILURE_TRIAGE resolver', () => {
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('contains all 4 triage steps', () => {
|
|
expect(shipSkill).toContain('Step T1: Classify each failure');
|
|
expect(shipSkill).toContain('Step T2: Handle in-branch failures');
|
|
expect(shipSkill).toContain('Step T3: Handle pre-existing failures');
|
|
expect(shipSkill).toContain('Step T4: Execute the chosen action');
|
|
});
|
|
|
|
test('T1 includes classification criteria (in-branch vs pre-existing)', () => {
|
|
expect(shipSkill).toContain('In-branch');
|
|
expect(shipSkill).toContain('Likely pre-existing');
|
|
expect(shipSkill).toContain('git diff origin/');
|
|
});
|
|
|
|
test('T3 branches on REPO_MODE (solo vs collaborative)', () => {
|
|
expect(shipSkill).toContain('REPO_MODE');
|
|
expect(shipSkill).toContain('solo');
|
|
expect(shipSkill).toContain('collaborative');
|
|
});
|
|
|
|
test('solo mode offers fix-now, TODO, and skip options', () => {
|
|
expect(shipSkill).toContain('Investigate and fix now');
|
|
expect(shipSkill).toContain('Add as P0 TODO');
|
|
expect(shipSkill).toContain('Skip');
|
|
});
|
|
|
|
test('collaborative mode offers blame + assign option', () => {
|
|
expect(shipSkill).toContain('Blame + assign GitHub issue');
|
|
expect(shipSkill).toContain('gh issue create');
|
|
});
|
|
|
|
test('defaults ambiguous failures to in-branch (safety)', () => {
|
|
expect(shipSkill).toContain('When ambiguous, default to in-branch');
|
|
});
|
|
});
|
|
|
|
// --- {{PLAN_FILE_REVIEW_REPORT}} resolver tests ---
|
|
|
|
describe('PLAN_FILE_REVIEW_REPORT resolver', () => {
|
|
const REVIEW_SKILLS = ['plan-ceo-review', 'plan-eng-review', 'plan-design-review', 'codex'];
|
|
|
|
for (const skill of REVIEW_SKILLS) {
|
|
test(`plan file review report appears in ${skill} generated file`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('GSTACK REVIEW REPORT');
|
|
});
|
|
}
|
|
|
|
test('resolver output contains key report elements', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Trigger');
|
|
expect(content).toContain('Findings');
|
|
expect(content).toContain('VERDICT');
|
|
expect(content).toContain('/plan-ceo-review');
|
|
expect(content).toContain('/plan-eng-review');
|
|
expect(content).toContain('/plan-design-review');
|
|
expect(content).toContain('/codex review');
|
|
});
|
|
});
|
|
|
|
// --- {{PLAN_COMPLETION_AUDIT}} resolver tests ---
|
|
|
|
describe('PLAN_COMPLETION_AUDIT placeholders', () => {
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('ship SKILL.md contains plan completion audit step', () => {
|
|
expect(shipSkill).toContain('Plan Completion Audit');
|
|
expect(shipSkill).toContain('Step 8');
|
|
});
|
|
|
|
test('review SKILL.md contains plan completion in scope drift', () => {
|
|
expect(reviewSkill).toContain('Plan File Discovery');
|
|
expect(reviewSkill).toContain('Actionable Item Extraction');
|
|
expect(reviewSkill).toContain('Integration with Scope Drift Detection');
|
|
});
|
|
|
|
test('both modes share plan file discovery methodology', () => {
|
|
expect(shipSkill).toContain('Plan File Discovery');
|
|
expect(reviewSkill).toContain('Plan File Discovery');
|
|
// Both should have conversation context first
|
|
expect(shipSkill).toContain('Conversation context (primary)');
|
|
expect(reviewSkill).toContain('Conversation context (primary)');
|
|
// Both should have grep fallback
|
|
expect(shipSkill).toContain('Content-based search (fallback)');
|
|
expect(reviewSkill).toContain('Content-based search (fallback)');
|
|
});
|
|
|
|
test('ship mode has gate logic for NOT DONE items', () => {
|
|
expect(shipSkill).toContain('NOT DONE');
|
|
expect(shipSkill).toContain('Stop — implement the missing items');
|
|
expect(shipSkill).toContain('Ship anyway — defer');
|
|
expect(shipSkill).toContain('intentionally dropped');
|
|
});
|
|
|
|
test('review mode is INFORMATIONAL only', () => {
|
|
expect(reviewSkill).toContain('INFORMATIONAL');
|
|
expect(reviewSkill).toContain('MISSING REQUIREMENTS');
|
|
expect(reviewSkill).toContain('SCOPE CREEP');
|
|
});
|
|
|
|
test('item extraction has 50-item cap', () => {
|
|
expect(shipSkill).toContain('at most 50 items');
|
|
});
|
|
|
|
test('uses file-level traceability (not commit-level)', () => {
|
|
expect(shipSkill).toContain('Cite the specific file');
|
|
expect(shipSkill).not.toContain('commit-level traceability');
|
|
});
|
|
});
|
|
|
|
// --- {{PLAN_VERIFICATION_EXEC}} resolver tests ---
|
|
|
|
describe('PLAN_VERIFICATION_EXEC placeholder', () => {
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('ship SKILL.md contains plan verification step', () => {
|
|
expect(shipSkill).toContain('Step 8.1');
|
|
expect(shipSkill).toContain('Plan Verification');
|
|
});
|
|
|
|
test('references /qa-only invocation', () => {
|
|
expect(shipSkill).toContain('qa-only/SKILL.md');
|
|
expect(shipSkill).toContain('qa-only');
|
|
});
|
|
|
|
test('contains localhost reachability check', () => {
|
|
expect(shipSkill).toContain('localhost:3000');
|
|
expect(shipSkill).toContain('NO_SERVER');
|
|
});
|
|
|
|
test('skips gracefully when no verification section', () => {
|
|
expect(shipSkill).toContain('No verification steps found in plan');
|
|
});
|
|
|
|
test('skips gracefully when no dev server', () => {
|
|
expect(shipSkill).toContain('No dev server detected');
|
|
});
|
|
});
|
|
|
|
// --- Coverage gate tests ---
|
|
|
|
describe('Coverage gate in ship', () => {
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('ship SKILL.md contains coverage gate with thresholds', () => {
|
|
expect(shipSkill).toContain('Coverage gate');
|
|
expect(shipSkill).toContain('>= target');
|
|
expect(shipSkill).toContain('< minimum');
|
|
});
|
|
|
|
test('ship SKILL.md supports configurable thresholds via CLAUDE.md', () => {
|
|
expect(shipSkill).toContain('## Test Coverage');
|
|
expect(shipSkill).toContain('Minimum:');
|
|
expect(shipSkill).toContain('Target:');
|
|
});
|
|
|
|
test('coverage gate skips on parse failure (not block)', () => {
|
|
expect(shipSkill).toContain('could not determine percentage — skipping');
|
|
});
|
|
|
|
test('review SKILL.md delegates coverage to Testing specialist', () => {
|
|
// Coverage audit moved to Testing specialist subagent in Review Army
|
|
expect(reviewSkill).toContain('testing.md');
|
|
expect(reviewSkill).toContain('INFORMATIONAL');
|
|
});
|
|
});
|
|
|
|
// --- Ship metrics logging ---
|
|
|
|
describe('Ship metrics logging', () => {
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('ship SKILL.md contains metrics persistence step', () => {
|
|
expect(shipSkill).toContain('Step 20');
|
|
expect(shipSkill).toContain('coverage_pct');
|
|
expect(shipSkill).toContain('plan_items_total');
|
|
expect(shipSkill).toContain('plan_items_done');
|
|
expect(shipSkill).toContain('verification_result');
|
|
});
|
|
});
|
|
|
|
// --- Plan file discovery shared helper ---
|
|
|
|
describe('Plan file discovery shared helper', () => {
|
|
// The shared helper should appear in ship (via PLAN_COMPLETION_AUDIT_SHIP)
|
|
// and in review (via PLAN_COMPLETION_AUDIT_REVIEW)
|
|
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('plan file discovery appears in both ship and review', () => {
|
|
expect(shipSkill).toContain('Plan File Discovery');
|
|
expect(reviewSkill).toContain('Plan File Discovery');
|
|
});
|
|
|
|
test('both include conversation context first', () => {
|
|
expect(shipSkill).toContain('Conversation context (primary)');
|
|
expect(reviewSkill).toContain('Conversation context (primary)');
|
|
});
|
|
|
|
test('both include content-based fallback', () => {
|
|
expect(shipSkill).toContain('Content-based search (fallback)');
|
|
expect(reviewSkill).toContain('Content-based search (fallback)');
|
|
});
|
|
});
|
|
|
|
// --- Retro plan completion ---
|
|
|
|
describe('Retro plan completion section', () => {
|
|
const retroSkill = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
|
|
|
test('retro SKILL.md contains plan completion section', () => {
|
|
expect(retroSkill).toContain('### Plan Completion');
|
|
expect(retroSkill).toContain('plan_items_total');
|
|
expect(retroSkill).toContain('Plan Completion This Period');
|
|
});
|
|
});
|
|
|
|
// --- Plan status footer in preamble ---
|
|
|
|
describe('Plan status footer in preamble', () => {
|
|
test('preamble contains plan status footer', () => {
|
|
// Read any skill that uses PREAMBLE
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Plan Status Footer');
|
|
expect(content).toContain('GSTACK REVIEW REPORT');
|
|
expect(content).toContain('gstack-review-read');
|
|
expect(content).toContain('ExitPlanMode');
|
|
expect(content).toContain('NO REVIEWS YET');
|
|
});
|
|
});
|
|
|
|
// --- Skill invocation during plan mode in preamble ---
|
|
|
|
describe('Skill invocation during plan mode in preamble', () => {
|
|
test('preamble contains skill invocation plan mode section', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Skill Invocation During Plan Mode');
|
|
expect(content).toContain('precedence over generic plan mode behavior');
|
|
expect(content).toContain('Do not continue the workflow');
|
|
expect(content).toContain('cancel the skill or leave plan mode');
|
|
});
|
|
});
|
|
|
|
// --- {{SPEC_REVIEW_LOOP}} resolver tests ---
|
|
|
|
describe('SPEC_REVIEW_LOOP resolver', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
|
|
test('contains all 5 review dimensions', () => {
|
|
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
|
expect(content).toContain(dim);
|
|
}
|
|
});
|
|
|
|
test('references Agent tool for subagent dispatch', () => {
|
|
expect(content).toMatch(/Agent.*tool/i);
|
|
});
|
|
|
|
test('specifies max 3 iterations', () => {
|
|
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
|
});
|
|
|
|
test('includes quality score', () => {
|
|
expect(content).toContain('quality score');
|
|
});
|
|
|
|
test('includes metrics path', () => {
|
|
expect(content).toContain('spec-review.jsonl');
|
|
});
|
|
|
|
test('includes convergence guard', () => {
|
|
expect(content).toMatch(/[Cc]onvergence/);
|
|
});
|
|
|
|
test('includes graceful failure handling', () => {
|
|
expect(content).toMatch(/skip.*review|unavailable/i);
|
|
});
|
|
});
|
|
|
|
// --- {{DESIGN_SKETCH}} resolver tests ---
|
|
|
|
describe('DESIGN_SKETCH resolver', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
|
|
test('references DESIGN.md for design system constraints', () => {
|
|
expect(content).toContain('DESIGN.md');
|
|
});
|
|
|
|
test('contains wireframe or sketch terminology', () => {
|
|
expect(content).toMatch(/wireframe|sketch/i);
|
|
});
|
|
|
|
test('references browse binary for rendering', () => {
|
|
expect(content).toContain('$B goto');
|
|
});
|
|
|
|
test('references screenshot capture', () => {
|
|
expect(content).toContain('$B screenshot');
|
|
});
|
|
|
|
test('specifies rough aesthetic', () => {
|
|
expect(content).toMatch(/[Rr]ough|hand-drawn/);
|
|
});
|
|
|
|
test('includes skip conditions', () => {
|
|
expect(content).toMatch(/no UI component|skip/i);
|
|
});
|
|
});
|
|
|
|
// --- {{CODEX_SECOND_OPINION}} resolver tests ---
|
|
|
|
describe('CODEX_SECOND_OPINION resolver', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
const codexContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-office-hours', 'SKILL.md'), 'utf-8');
|
|
|
|
test('Phase 3.5 section appears in office-hours SKILL.md', () => {
|
|
expect(content).toContain('Phase 3.5: Cross-Model Second Opinion');
|
|
});
|
|
|
|
test('contains codex exec invocation', () => {
|
|
expect(content).toContain('codex exec');
|
|
});
|
|
|
|
test('contains opt-in AskUserQuestion text', () => {
|
|
expect(content).toContain('second opinion from an independent AI perspective');
|
|
});
|
|
|
|
test('contains cross-model synthesis instructions', () => {
|
|
expect(content).toMatch(/[Ss]ynthesis/);
|
|
expect(content).toContain('Where Claude agrees with the second opinion');
|
|
});
|
|
|
|
test('contains Claude subagent fallback', () => {
|
|
expect(content).toContain('CODEX_NOT_AVAILABLE');
|
|
expect(content).toContain('Agent tool');
|
|
expect(content).toContain('SECOND OPINION (Claude subagent)');
|
|
});
|
|
|
|
test('contains premise revision check', () => {
|
|
expect(content).toContain('Codex challenged premise');
|
|
});
|
|
|
|
test('contains error handling for auth, timeout, and empty', () => {
|
|
expect(content).toMatch(/[Aa]uth.*fail/);
|
|
expect(content).toMatch(/[Tt]imeout/);
|
|
expect(content).toMatch(/[Ee]mpty response/);
|
|
});
|
|
|
|
test('Codex host variant does NOT contain the Phase 3.5 resolver output', () => {
|
|
// The resolver returns '' for codex host, so the interactive section is stripped.
|
|
// Static template references to "Phase 3.5" in prose/conditionals are fine.
|
|
// Other resolvers (design review lite) may contain CODEX_NOT_AVAILABLE, so we
|
|
// check for Phase 3.5-specific markers only.
|
|
expect(codexContent).not.toContain('Phase 3.5: Cross-Model Second Opinion');
|
|
expect(codexContent).not.toContain('TMPERR_OH');
|
|
expect(codexContent).not.toContain('gstack-codex-oh-');
|
|
});
|
|
});
|
|
|
|
// --- Codex filesystem boundary tests ---
|
|
|
|
describe('Codex filesystem boundary', () => {
|
|
// Skills that call codex exec/review and should contain boundary text
|
|
const CODEX_CALLING_SKILLS = [
|
|
'codex', // /codex skill — 3 modes
|
|
'autoplan', // /autoplan — CEO/design/eng voices
|
|
'review', // /review — adversarial step resolver
|
|
'ship', // /ship — adversarial step resolver
|
|
'plan-eng-review', // outside voice resolver
|
|
'plan-ceo-review', // outside voice resolver
|
|
'office-hours', // second opinion resolver
|
|
];
|
|
|
|
const BOUNDARY_MARKER = 'Do NOT read or execute any';
|
|
|
|
test('boundary instruction appears in all skills that call codex', () => {
|
|
for (const skill of CODEX_CALLING_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain(BOUNDARY_MARKER);
|
|
}
|
|
});
|
|
|
|
test('codex skill has Filesystem Boundary section', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('## Filesystem Boundary');
|
|
expect(content).toContain('skill definitions meant for a different AI system');
|
|
});
|
|
|
|
test('codex skill has rabbit-hole detection rule', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Detect skill-file rabbit holes');
|
|
expect(content).toContain('gstack-update-check');
|
|
expect(content).toContain('Consider retrying');
|
|
});
|
|
|
|
test('review.ts CODEX_BOUNDARY constant is interpolated into resolver output', () => {
|
|
// The adversarial step resolver should include boundary text in codex exec prompts
|
|
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
// Boundary should appear near codex exec invocations
|
|
const boundaryIdx = reviewContent.indexOf(BOUNDARY_MARKER);
|
|
const codexExecIdx = reviewContent.indexOf('codex exec');
|
|
// Both must exist and boundary must come before a codex exec call
|
|
expect(boundaryIdx).toBeGreaterThan(-1);
|
|
expect(codexExecIdx).toBeGreaterThan(-1);
|
|
});
|
|
|
|
test('autoplan boundary text avoids host-specific paths for cross-host compatibility', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'autoplan', 'SKILL.md.tmpl'), 'utf-8');
|
|
// autoplan template uses generic 'skills/gstack' pattern instead of host-specific
|
|
// paths like ~/.claude/ or .agents/skills (which break Codex/Claude output tests)
|
|
const boundaryStart = content.indexOf('Filesystem Boundary');
|
|
const boundaryEnd = content.indexOf('---', boundaryStart + 1);
|
|
const boundarySection = content.slice(boundaryStart, boundaryEnd);
|
|
expect(boundarySection).not.toContain('~/.claude/');
|
|
expect(boundarySection).not.toContain('.agents/skills');
|
|
expect(boundarySection).toContain('skills/gstack');
|
|
expect(boundarySection).toContain(BOUNDARY_MARKER);
|
|
});
|
|
});
|
|
|
|
// --- {{BENEFITS_FROM}} resolver tests ---
|
|
|
|
describe('BENEFITS_FROM resolver', () => {
|
|
const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
const engContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('plan-ceo-review contains prerequisite skill offer', () => {
|
|
expect(ceoContent).toContain('Prerequisite Skill Offer');
|
|
expect(ceoContent).toContain('/office-hours');
|
|
});
|
|
|
|
test('plan-eng-review contains prerequisite skill offer', () => {
|
|
expect(engContent).toContain('Prerequisite Skill Offer');
|
|
expect(engContent).toContain('/office-hours');
|
|
});
|
|
|
|
test('offer includes graceful decline', () => {
|
|
expect(ceoContent).toContain('No worries');
|
|
});
|
|
|
|
test('skills without benefits-from do NOT have prerequisite offer', () => {
|
|
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
|
expect(qaContent).not.toContain('Prerequisite Skill Offer');
|
|
});
|
|
|
|
test('inline invocation — no "another window" language', () => {
|
|
expect(ceoContent).not.toContain('another window');
|
|
expect(engContent).not.toContain('another window');
|
|
});
|
|
|
|
test('inline invocation — read-and-follow path present', () => {
|
|
expect(ceoContent).toContain('office-hours/SKILL.md');
|
|
expect(engContent).toContain('office-hours/SKILL.md');
|
|
});
|
|
|
|
test('BENEFITS_FROM delegates to INVOKE_SKILL pattern', () => {
|
|
// Should contain the INVOKE_SKILL-style loading prose (not the old manual skip list)
|
|
expect(engContent).toContain('Follow its instructions from top to bottom');
|
|
expect(engContent).toContain('skipping these sections');
|
|
expect(ceoContent).toContain('Follow its instructions from top to bottom');
|
|
});
|
|
});
|
|
|
|
// --- {{INVOKE_SKILL}} resolver tests ---
|
|
|
|
describe('INVOKE_SKILL resolver', () => {
|
|
const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
|
|
test('plan-ceo-review uses INVOKE_SKILL for mid-session office-hours fallback', () => {
|
|
// The mid-session detection path should use INVOKE_SKILL-generated prose
|
|
expect(ceoContent).toContain('office-hours/SKILL.md');
|
|
expect(ceoContent).toContain('Follow its instructions from top to bottom');
|
|
});
|
|
|
|
test('INVOKE_SKILL output includes default skip list', () => {
|
|
expect(ceoContent).toContain('Preamble (run first)');
|
|
expect(ceoContent).toContain('Telemetry (run last)');
|
|
expect(ceoContent).toContain('AskUserQuestion Format');
|
|
});
|
|
|
|
test('INVOKE_SKILL output includes error handling', () => {
|
|
expect(ceoContent).toContain('If unreadable');
|
|
expect(ceoContent).toContain('Could not load');
|
|
});
|
|
|
|
test('template uses {{INVOKE_SKILL:office-hours}} placeholder', () => {
|
|
const tmpl = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(tmpl).toContain('{{INVOKE_SKILL:office-hours}}');
|
|
});
|
|
});
|
|
|
|
// --- {{CHANGELOG_WORKFLOW}} resolver tests ---
|
|
|
|
describe('CHANGELOG_WORKFLOW resolver', () => {
|
|
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('ship SKILL.md contains changelog workflow', () => {
|
|
expect(shipContent).toContain('CHANGELOG (auto-generate)');
|
|
expect(shipContent).toContain('git log <base>..HEAD --oneline');
|
|
});
|
|
|
|
test('changelog workflow includes cross-check step', () => {
|
|
expect(shipContent).toContain('Cross-check');
|
|
expect(shipContent).toContain('Every commit must map to at least one bullet point');
|
|
});
|
|
|
|
test('changelog workflow includes voice guidance', () => {
|
|
expect(shipContent).toContain('Lead with what the user can now **do**');
|
|
});
|
|
|
|
test('template uses {{CHANGELOG_WORKFLOW}} placeholder', () => {
|
|
const tmpl = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md.tmpl'), 'utf-8');
|
|
expect(tmpl).toContain('{{CHANGELOG_WORKFLOW}}');
|
|
// Should NOT contain the old inline changelog content
|
|
expect(tmpl).not.toContain('Group commits by theme');
|
|
});
|
|
|
|
test('changelog workflow includes keep-changelog format', () => {
|
|
expect(shipContent).toContain('### Added');
|
|
expect(shipContent).toContain('### Fixed');
|
|
});
|
|
});
|
|
|
|
// --- Parameterized resolver infrastructure tests ---
|
|
|
|
describe('parameterized resolver support', () => {
|
|
test('gen-skill-docs regex handles colon-separated args', () => {
|
|
// Verify the template containing {{INVOKE_SKILL:office-hours}} was processed
|
|
// without leaving unresolved placeholders
|
|
const ceoContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
|
expect(ceoContent).not.toMatch(/\{\{INVOKE_SKILL:[^}]+\}\}/);
|
|
});
|
|
|
|
test('templates with parameterized resolvers pass unresolved check', () => {
|
|
// All generated SKILL.md files should have no unresolved {{...}} placeholders
|
|
const skillDirs = fs.readdirSync(ROOT).filter(d =>
|
|
fs.existsSync(path.join(ROOT, d, 'SKILL.md'))
|
|
);
|
|
for (const dir of skillDirs) {
|
|
const content = fs.readFileSync(path.join(ROOT, dir, 'SKILL.md'), 'utf-8');
|
|
const unresolved = content.match(/\{\{[A-Z_]+(?::[^}]*)?\}\}/g);
|
|
if (unresolved) {
|
|
throw new Error(`${dir}/SKILL.md has unresolved placeholders: ${unresolved.join(', ')}`);
|
|
}
|
|
}
|
|
});
|
|
});
|
|
|
|
// --- Preamble routing injection tests ---
|
|
|
|
describe('preamble routing injection', () => {
|
|
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('preamble bash checks for routing section in CLAUDE.md', () => {
|
|
expect(shipContent).toContain('grep -q "## Skill routing" CLAUDE.md');
|
|
expect(shipContent).toContain('HAS_ROUTING');
|
|
});
|
|
|
|
test('preamble bash reads routing_declined config', () => {
|
|
expect(shipContent).toContain('routing_declined');
|
|
expect(shipContent).toContain('ROUTING_DECLINED');
|
|
});
|
|
|
|
test('preamble includes routing injection AskUserQuestion', () => {
|
|
expect(shipContent).toContain('Add routing rules to CLAUDE.md');
|
|
expect(shipContent).toContain("I'll invoke skills manually");
|
|
});
|
|
|
|
test('routing injection respects prior decline', () => {
|
|
expect(shipContent).toContain('ROUTING_DECLINED');
|
|
expect(shipContent).toMatch(/routing_declined.*true/);
|
|
});
|
|
|
|
test('routing injection only fires when all conditions met', () => {
|
|
// Must be: HAS_ROUTING=no AND ROUTING_DECLINED=false AND PROACTIVE_PROMPTED=yes
|
|
expect(shipContent).toContain('HAS_ROUTING');
|
|
expect(shipContent).toContain('ROUTING_DECLINED');
|
|
expect(shipContent).toContain('PROACTIVE_PROMPTED');
|
|
});
|
|
|
|
test('routing section content includes key routing rules', () => {
|
|
expect(shipContent).toContain('invoke office-hours');
|
|
expect(shipContent).toContain('invoke investigate');
|
|
expect(shipContent).toContain('invoke ship');
|
|
expect(shipContent).toContain('invoke qa');
|
|
});
|
|
});
|
|
|
|
// --- {{DESIGN_OUTSIDE_VOICES}} resolver tests ---
|
|
|
|
describe('DESIGN_OUTSIDE_VOICES resolver', () => {
|
|
test('plan-design-review contains outside voices section', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Design Outside Voices');
|
|
expect(content).toContain('CODEX_AVAILABLE');
|
|
expect(content).toContain('LITMUS SCORECARD');
|
|
});
|
|
|
|
test('design-review contains outside voices section', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Design Outside Voices');
|
|
expect(content).toContain('source audit');
|
|
});
|
|
|
|
test('design-consultation contains outside voices section', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'design-consultation', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Design Outside Voices');
|
|
expect(content).toContain('design direction');
|
|
});
|
|
|
|
test('branches correctly per skillName — different prompts', () => {
|
|
const planContent = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
const consultContent = fs.readFileSync(path.join(ROOT, 'design-consultation', 'SKILL.md'), 'utf-8');
|
|
// plan-design-review uses analytical prompt (high reasoning)
|
|
expect(planContent).toContain('model_reasoning_effort="high"');
|
|
// design-consultation uses creative prompt (medium reasoning)
|
|
expect(consultContent).toContain('model_reasoning_effort="medium"');
|
|
});
|
|
});
|
|
|
|
// --- {{DESIGN_HARD_RULES}} resolver tests ---
|
|
|
|
describe('DESIGN_HARD_RULES resolver', () => {
|
|
test('plan-design-review Pass 4 contains hard rules', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Design Hard Rules');
|
|
expect(content).toContain('Classifier');
|
|
expect(content).toContain('MARKETING/LANDING PAGE');
|
|
expect(content).toContain('APP UI');
|
|
});
|
|
|
|
test('design-review contains hard rules', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Design Hard Rules');
|
|
});
|
|
|
|
test('includes all 3 rule sets', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Landing page rules');
|
|
expect(content).toContain('App UI rules');
|
|
expect(content).toContain('Universal rules');
|
|
});
|
|
|
|
test('references shared AI slop blacklist items', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('3-column feature grid');
|
|
expect(content).toContain('Purple/violet/indigo');
|
|
});
|
|
|
|
test('includes OpenAI hard rejection criteria', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Generic SaaS card grid');
|
|
expect(content).toContain('Carousel with no narrative purpose');
|
|
});
|
|
|
|
test('includes OpenAI litmus checks', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Brand/product unmistakable');
|
|
expect(content).toContain('premium with all decorative shadows removed');
|
|
});
|
|
});
|
|
|
|
// --- Extended DESIGN_SKETCH resolver tests ---
|
|
|
|
describe('DESIGN_SKETCH extended with outside voices', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
|
|
|
test('contains outside design voices step', () => {
|
|
expect(content).toContain('Outside design voices');
|
|
});
|
|
|
|
test('offers opt-in via AskUserQuestion', () => {
|
|
expect(content).toContain('outside design perspectives');
|
|
});
|
|
|
|
test('still contains original wireframe steps', () => {
|
|
expect(content).toContain('wireframe');
|
|
expect(content).toContain('$B goto');
|
|
});
|
|
});
|
|
|
|
// --- Extended DESIGN_REVIEW_LITE resolver tests ---
|
|
|
|
describe('DESIGN_REVIEW_LITE extended with Codex', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
|
|
test('contains Codex design voice block', () => {
|
|
expect(content).toContain('Codex design voice');
|
|
expect(content).toContain('CODEX (design)');
|
|
});
|
|
|
|
test('still contains original checklist steps', () => {
|
|
expect(content).toContain('design-checklist.md');
|
|
expect(content).toContain('SCOPE_FRONTEND');
|
|
});
|
|
|
|
});
|
|
|
|
// ─── Codex Generation Tests ─────────────────────────────────
|
|
|
|
describe('Codex generation (--host codex)', () => {
|
|
const AGENTS_DIR = path.join(ROOT, '.agents', 'skills');
|
|
|
|
// .agents/ is gitignored (v0.11.2.0) — generate on demand for tests
|
|
Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
|
|
// Dynamic discovery of expected Codex skills: all templates except /codex
|
|
// Also excludes skills where .agents/skills/{name} is a symlink back to the repo root
|
|
// (vendored dev mode — gen-skill-docs skips these to avoid overwriting Claude SKILL.md)
|
|
const CODEX_SKILLS = (() => {
|
|
const skills: Array<{ dir: string; codexName: string }> = [];
|
|
const isSymlinkLoop = (codexName: string): boolean => {
|
|
const agentSkillDir = path.join(ROOT, '.agents', 'skills', codexName);
|
|
try {
|
|
return fs.realpathSync(agentSkillDir) === fs.realpathSync(ROOT);
|
|
} catch { return false; }
|
|
};
|
|
if (fs.existsSync(path.join(ROOT, 'SKILL.md.tmpl'))) {
|
|
if (!isSymlinkLoop('gstack')) {
|
|
skills.push({ dir: '.', codexName: 'gstack' });
|
|
}
|
|
}
|
|
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
|
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
|
if (entry.name === 'codex') continue; // /codex is excluded from Codex output
|
|
if (!fs.existsSync(path.join(ROOT, entry.name, 'SKILL.md.tmpl'))) continue;
|
|
const codexName = entry.name.startsWith('gstack-') ? entry.name : `gstack-${entry.name}`;
|
|
if (isSymlinkLoop(codexName)) continue;
|
|
skills.push({ dir: entry.name, codexName });
|
|
}
|
|
return skills;
|
|
})();
|
|
|
|
test('--host codex generates correct output paths', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const skillMd = path.join(AGENTS_DIR, skill.codexName, 'SKILL.md');
|
|
expect(fs.existsSync(skillMd)).toBe(true);
|
|
}
|
|
});
|
|
|
|
test('root gstack bundle has OpenAI metadata for Codex skill browsing', () => {
|
|
const rootMetadata = path.join(ROOT, 'agents', 'openai.yaml');
|
|
expect(fs.existsSync(rootMetadata)).toBe(true);
|
|
const content = fs.readFileSync(rootMetadata, 'utf-8');
|
|
expect(content).toContain('display_name: "gstack"');
|
|
expect(content).toContain('Use $gstack to locate the bundled gstack skills.');
|
|
expect(content).toContain('allow_implicit_invocation: true');
|
|
});
|
|
|
|
test('externalSkillName mapping: root is gstack, others are gstack-{dir}', () => {
|
|
// Root → gstack
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack', 'SKILL.md'))).toBe(true);
|
|
// Subdirectories → gstack-{dir}
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'))).toBe(true);
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'))).toBe(true);
|
|
// gstack-upgrade doesn't double-prefix
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-upgrade', 'SKILL.md'))).toBe(true);
|
|
// No double-prefix: gstack-gstack-upgrade must NOT exist
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-gstack-upgrade', 'SKILL.md'))).toBe(false);
|
|
});
|
|
|
|
test('Codex frontmatter has ONLY name + description', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skill.codexName, 'SKILL.md'), 'utf-8');
|
|
expect(content.startsWith('---\n')).toBe(true);
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
expect(fmEnd).toBeGreaterThan(0);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
// Must have name and description
|
|
expect(frontmatter).toContain('name:');
|
|
expect(frontmatter).toContain('description:');
|
|
// Must NOT have allowed-tools, version, or hooks
|
|
expect(frontmatter).not.toContain('allowed-tools:');
|
|
expect(frontmatter).not.toContain('version:');
|
|
expect(frontmatter).not.toContain('hooks:');
|
|
}
|
|
});
|
|
|
|
test('all Codex skills have agents/openai.yaml metadata', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const metadata = path.join(AGENTS_DIR, skill.codexName, 'agents', 'openai.yaml');
|
|
expect(fs.existsSync(metadata)).toBe(true);
|
|
const content = fs.readFileSync(metadata, 'utf-8');
|
|
expect(content).toContain(`display_name: "${skill.codexName}"`);
|
|
expect(content).toContain('short_description:');
|
|
expect(content).toContain('allow_implicit_invocation: true');
|
|
}
|
|
});
|
|
|
|
test('no .claude/skills/ in Codex output', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skill.codexName, 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('.claude/skills');
|
|
}
|
|
});
|
|
|
|
test('no ~/.claude/ paths in Codex output', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skill.codexName, 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('~/.claude/');
|
|
}
|
|
});
|
|
|
|
test('/codex skill excluded from Codex output', () => {
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-codex', 'SKILL.md'))).toBe(false);
|
|
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-codex'))).toBe(false);
|
|
});
|
|
|
|
test('Codex review step stripped from Codex-host ship and review', () => {
|
|
const shipContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
|
|
expect(shipContent).not.toContain('codex review --base');
|
|
expect(shipContent).not.toContain('CODEX_REVIEWS');
|
|
|
|
const reviewContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
expect(reviewContent).not.toContain('codex review --base');
|
|
expect(reviewContent).not.toContain('CODEX_REVIEWS');
|
|
});
|
|
|
|
test('--host codex --dry-run freshness', () => {
|
|
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex', '--dry-run'], {
|
|
cwd: ROOT,
|
|
stdout: 'pipe',
|
|
stderr: 'pipe',
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const output = result.stdout.toString();
|
|
// Every Codex skill should be FRESH
|
|
for (const skill of CODEX_SKILLS) {
|
|
expect(output).toContain(`FRESH: .agents/skills/${skill.codexName}/SKILL.md`);
|
|
}
|
|
expect(output).not.toContain('STALE');
|
|
});
|
|
|
|
test('--host agents alias produces same output as --host codex', () => {
|
|
const codexResult = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex', '--dry-run'], {
|
|
cwd: ROOT,
|
|
stdout: 'pipe',
|
|
stderr: 'pipe',
|
|
});
|
|
const agentsResult = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'agents', '--dry-run'], {
|
|
cwd: ROOT,
|
|
stdout: 'pipe',
|
|
stderr: 'pipe',
|
|
});
|
|
expect(codexResult.exitCode).toBe(0);
|
|
expect(agentsResult.exitCode).toBe(0);
|
|
// Both should produce the same output (same FRESH lines)
|
|
expect(codexResult.stdout.toString()).toBe(agentsResult.stdout.toString());
|
|
});
|
|
|
|
test('multiline descriptions preserved in Codex output', () => {
|
|
// office-hours has a multiline description — verify it survives the frontmatter transform
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-office-hours', 'SKILL.md'), 'utf-8');
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
// Description should span multiple lines (block scalar)
|
|
const descLines = frontmatter.split('\n').filter(l => l.startsWith(' '));
|
|
expect(descLines.length).toBeGreaterThan(1);
|
|
// Verify key phrases survived
|
|
expect(frontmatter).toContain('YC Office Hours');
|
|
});
|
|
|
|
test('hook skills have safety prose and no hooks: in frontmatter', () => {
|
|
const HOOK_SKILLS = ['gstack-careful', 'gstack-freeze', 'gstack-guard'];
|
|
for (const skillName of HOOK_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skillName, 'SKILL.md'), 'utf-8');
|
|
// Must have safety advisory prose
|
|
expect(content).toContain('Safety Advisory');
|
|
// Must NOT have hooks: in frontmatter
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
expect(frontmatter).not.toContain('hooks:');
|
|
}
|
|
});
|
|
|
|
test('all Codex SKILL.md files have auto-generated header', () => {
|
|
for (const skill of CODEX_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skill.codexName, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('AUTO-GENERATED from SKILL.md.tmpl');
|
|
expect(content).toContain('Regenerate: bun run gen:skill-docs');
|
|
}
|
|
});
|
|
|
|
test('Codex preamble resolves runtime assets from repo-local or global gstack roots', () => {
|
|
// Check a skill that has a preamble (review is a good candidate)
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('GSTACK_ROOT');
|
|
expect(content).toContain('$_ROOT/.agents/skills/gstack');
|
|
expect(content).toContain('$GSTACK_BIN/gstack-config');
|
|
expect(content).toContain('$GSTACK_ROOT/gstack-upgrade/SKILL.md');
|
|
expect(content).not.toContain('~/.codex/skills/gstack/bin/gstack-config get telemetry');
|
|
});
|
|
|
|
// ─── Path rewriting regression tests ─────────────────────────
|
|
|
|
test('sidecar paths point to .agents/skills/gstack/review/ (not gstack-review/)', () => {
|
|
// Regression: gen-skill-docs rewrote .claude/skills/review → .agents/skills/gstack-review
|
|
// but setup puts sidecars under .agents/skills/gstack/review/. Must match setup layout.
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
// Correct: references to sidecar files use gstack/review/ path
|
|
expect(content).toContain('.agents/skills/gstack/review/checklist.md');
|
|
// design-checklist.md is now referenced via Review Army specialist (Claude only, stripped for Codex)
|
|
// Wrong: must NOT reference gstack-review/checklist.md (file doesn't exist there)
|
|
expect(content).not.toContain('.agents/skills/gstack-review/checklist.md');
|
|
});
|
|
|
|
test('sidecar paths in ship skill point to gstack/review/ for pre-landing review', () => {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
|
|
// Ship references the review checklist in its pre-landing review step
|
|
if (content.includes('checklist.md')) {
|
|
expect(content).toContain('.agents/skills/gstack/review/');
|
|
expect(content).not.toContain('.agents/skills/gstack-review/checklist');
|
|
}
|
|
});
|
|
|
|
test('greptile-triage sidecar path is correct', () => {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
if (content.includes('greptile-triage')) {
|
|
expect(content).toContain('.agents/skills/gstack/review/greptile-triage.md');
|
|
expect(content).not.toContain('.agents/skills/gstack-review/greptile-triage');
|
|
}
|
|
});
|
|
|
|
test('all four path rewrite rules produce correct output', () => {
|
|
// Test each of the 4 path rewrite rules individually
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
|
|
// Rule 1: ~/.claude/skills/gstack → $GSTACK_ROOT
|
|
expect(content).not.toContain('~/.claude/skills/gstack');
|
|
expect(content).toContain('$GSTACK_ROOT');
|
|
|
|
// Rule 2: .claude/skills/gstack → .agents/skills/gstack
|
|
expect(content).not.toContain('.claude/skills/gstack');
|
|
|
|
// Rule 3: .claude/skills/review → .agents/skills/gstack/review
|
|
expect(content).not.toContain('.claude/skills/review');
|
|
|
|
// Rule 4: .claude/skills → .agents/skills (catch-all)
|
|
expect(content).not.toContain('.claude/skills');
|
|
});
|
|
|
|
test('path rewrite rules apply to all Codex skills with sidecar references', () => {
|
|
// Verify across ALL generated skills, not just review
|
|
for (const skill of CODEX_SKILLS) {
|
|
const content = fs.readFileSync(path.join(AGENTS_DIR, skill.codexName, 'SKILL.md'), 'utf-8');
|
|
// No skill should reference Claude paths
|
|
expect(content).not.toContain('~/.claude/skills');
|
|
expect(content).not.toContain('.claude/skills');
|
|
if (content.includes('gstack-config') || content.includes('gstack-update-check') || content.includes('gstack-telemetry-log')) {
|
|
expect(content).toContain('$GSTACK_ROOT');
|
|
}
|
|
// If a skill references checklist.md, it must use the correct sidecar path
|
|
if (content.includes('checklist.md') && !content.includes('design-checklist.md')) {
|
|
expect(content).not.toContain('gstack-review/checklist.md');
|
|
}
|
|
}
|
|
});
|
|
|
|
// ─── Claude output regression guard ─────────────────────────
|
|
|
|
test('Claude output unchanged: review skill still uses .claude/skills/ paths', () => {
|
|
// Codex changes must NOT affect Claude output
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('.claude/skills/review/checklist.md');
|
|
expect(content).toContain('~/.claude/skills/gstack');
|
|
// Must NOT contain Codex paths
|
|
expect(content).not.toContain('.agents/skills');
|
|
expect(content).not.toContain('~/.codex/');
|
|
});
|
|
|
|
test('Claude output unchanged: ship skill still uses .claude/skills/ paths', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('~/.claude/skills/gstack');
|
|
expect(content).not.toContain('.agents/skills');
|
|
expect(content).not.toContain('~/.codex/');
|
|
});
|
|
|
|
test('Claude output unchanged: all Claude skills have zero Codex paths', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8');
|
|
// pair-agent legitimately documents how Codex agents store credentials.
|
|
// codex + autoplan document the Codex CLI auth file (~/.codex/auth.json)
|
|
// and log path (~/.codex/logs/) — those are user-facing Codex CLI paths,
|
|
// not the gstack Codex host install path.
|
|
if (skill.dir !== 'pair-agent' && skill.dir !== 'codex' && skill.dir !== 'autoplan') {
|
|
expect(content).not.toContain('~/.codex/');
|
|
}
|
|
// gstack-upgrade legitimately references .agents/skills for cross-platform detection
|
|
if (skill.dir !== 'gstack-upgrade') {
|
|
expect(content).not.toContain('.agents/skills');
|
|
}
|
|
}
|
|
});
|
|
|
|
// ─── Design outside voices: Codex host guard ─────────────────
|
|
|
|
test('codex host produces empty outside voices in design-review', () => {
|
|
const codexContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-design-review', 'SKILL.md'), 'utf-8');
|
|
expect(codexContent).not.toContain('Design Outside Voices');
|
|
});
|
|
|
|
test('codex host does not include Codex design block in ship', () => {
|
|
const codexContent = fs.readFileSync(path.join(AGENTS_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
|
|
expect(codexContent).not.toContain('Codex design voice');
|
|
});
|
|
});
|
|
|
|
// ─── Factory generation tests ────────────────────────────────
|
|
|
|
describe('Factory generation (--host factory)', () => {
|
|
const FACTORY_DIR = path.join(ROOT, '.factory', 'skills');
|
|
|
|
// Generate Factory output for tests
|
|
Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'factory'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
|
|
const FACTORY_SKILLS = (() => {
|
|
const skills: Array<{ dir: string; factoryName: string }> = [];
|
|
const isSymlinkLoop = (name: string): boolean => {
|
|
const factorySkillDir = path.join(ROOT, '.factory', 'skills', name);
|
|
try { return fs.realpathSync(factorySkillDir) === fs.realpathSync(ROOT); }
|
|
catch { return false; }
|
|
};
|
|
if (fs.existsSync(path.join(ROOT, 'SKILL.md.tmpl'))) {
|
|
if (!isSymlinkLoop('gstack')) skills.push({ dir: '.', factoryName: 'gstack' });
|
|
}
|
|
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
|
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
|
if (entry.name === 'codex') continue;
|
|
if (!fs.existsSync(path.join(ROOT, entry.name, 'SKILL.md.tmpl'))) continue;
|
|
const factoryName = entry.name.startsWith('gstack-') ? entry.name : `gstack-${entry.name}`;
|
|
if (isSymlinkLoop(factoryName)) continue;
|
|
skills.push({ dir: entry.name, factoryName });
|
|
}
|
|
return skills;
|
|
})();
|
|
|
|
test('--host factory generates correct output paths', () => {
|
|
for (const skill of FACTORY_SKILLS) {
|
|
const skillMd = path.join(FACTORY_DIR, skill.factoryName, 'SKILL.md');
|
|
expect(fs.existsSync(skillMd)).toBe(true);
|
|
}
|
|
});
|
|
|
|
test('Factory frontmatter has name + description + user-invocable', () => {
|
|
for (const skill of FACTORY_SKILLS) {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, skill.factoryName, 'SKILL.md'), 'utf-8');
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
expect(frontmatter).toContain('name:');
|
|
expect(frontmatter).toContain('description:');
|
|
expect(frontmatter).toContain('user-invocable: true');
|
|
expect(frontmatter).not.toContain('allowed-tools:');
|
|
expect(frontmatter).not.toContain('preamble-tier:');
|
|
expect(frontmatter).not.toContain('sensitive:');
|
|
}
|
|
});
|
|
|
|
test('sensitive skills have disable-model-invocation', () => {
|
|
const SENSITIVE = ['gstack-ship', 'gstack-land-and-deploy', 'gstack-guard', 'gstack-careful', 'gstack-freeze', 'gstack-unfreeze'];
|
|
for (const name of SENSITIVE) {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, name, 'SKILL.md'), 'utf-8');
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
expect(frontmatter).toContain('disable-model-invocation: true');
|
|
}
|
|
});
|
|
|
|
test('non-sensitive skills lack disable-model-invocation', () => {
|
|
const NON_SENSITIVE = ['gstack-qa', 'gstack-review', 'gstack-investigate', 'gstack-browse'];
|
|
for (const name of NON_SENSITIVE) {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, name, 'SKILL.md'), 'utf-8');
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(4, fmEnd);
|
|
expect(frontmatter).not.toContain('disable-model-invocation');
|
|
}
|
|
});
|
|
|
|
test('no .claude/skills/ in Factory output', () => {
|
|
for (const skill of FACTORY_SKILLS) {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, skill.factoryName, 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('.claude/skills');
|
|
}
|
|
});
|
|
|
|
test('no ~/.claude/skills/ paths in Factory output', () => {
|
|
for (const skill of FACTORY_SKILLS) {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, skill.factoryName, 'SKILL.md'), 'utf-8');
|
|
// ~/.claude/skills should be rewritten, but ~/.claude/plans is legitimate
|
|
// (plan directory lookup) and ~/.claude/ in codex prompts is intentional
|
|
expect(content).not.toContain('~/.claude/skills');
|
|
}
|
|
});
|
|
|
|
test('/codex skill excluded from Factory output', () => {
|
|
expect(fs.existsSync(path.join(FACTORY_DIR, 'gstack-codex', 'SKILL.md'))).toBe(false);
|
|
expect(fs.existsSync(path.join(FACTORY_DIR, 'gstack-codex'))).toBe(false);
|
|
});
|
|
|
|
test('Factory keeps Codex integration blocks', () => {
|
|
// Factory users CAN use Codex second opinions (codex exec is a standalone binary)
|
|
const shipContent = fs.readFileSync(path.join(FACTORY_DIR, 'gstack-ship', 'SKILL.md'), 'utf-8');
|
|
expect(shipContent).toContain('codex');
|
|
});
|
|
|
|
test('no agents/openai.yaml in Factory output', () => {
|
|
for (const skill of FACTORY_SKILLS) {
|
|
const yamlPath = path.join(FACTORY_DIR, skill.factoryName, 'agents', 'openai.yaml');
|
|
expect(fs.existsSync(yamlPath)).toBe(false);
|
|
}
|
|
});
|
|
|
|
test('--host droid alias works', () => {
|
|
const factoryResult = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'factory', '--dry-run'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
const droidResult = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'droid', '--dry-run'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
expect(factoryResult.exitCode).toBe(0);
|
|
expect(droidResult.exitCode).toBe(0);
|
|
expect(factoryResult.stdout.toString()).toBe(droidResult.stdout.toString());
|
|
});
|
|
|
|
test('--host factory --dry-run freshness', () => {
|
|
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'factory', '--dry-run'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const output = result.stdout.toString();
|
|
for (const skill of FACTORY_SKILLS) {
|
|
expect(output).toContain(`FRESH: .factory/skills/${skill.factoryName}/SKILL.md`);
|
|
}
|
|
expect(output).not.toContain('STALE');
|
|
});
|
|
|
|
test('Factory preamble uses .factory paths', () => {
|
|
const content = fs.readFileSync(path.join(FACTORY_DIR, 'gstack-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('GSTACK_ROOT');
|
|
expect(content).toContain('$_ROOT/.factory/skills/gstack');
|
|
expect(content).toContain('$GSTACK_BIN/gstack-config');
|
|
});
|
|
});
|
|
|
|
// ─── Parameterized host smoke tests (config-driven) ─────────
|
|
|
|
import { ALL_HOST_CONFIGS, getExternalHosts } from '../hosts/index';
|
|
|
|
describe('Parameterized host smoke tests', () => {
|
|
for (const hostConfig of getExternalHosts()) {
|
|
describe(`${hostConfig.displayName} (--host ${hostConfig.name})`, () => {
|
|
const hostDir = path.join(ROOT, hostConfig.hostSubdir, 'skills');
|
|
|
|
test('generates output that exists on disk', () => {
|
|
// Generated dir should exist (created by earlier bun run gen:skill-docs --host all)
|
|
if (!fs.existsSync(hostDir)) {
|
|
// Generate if not already done
|
|
Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', hostConfig.name], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
}
|
|
expect(fs.existsSync(hostDir)).toBe(true);
|
|
const skills = fs.readdirSync(hostDir).filter(d =>
|
|
fs.existsSync(path.join(hostDir, d, 'SKILL.md'))
|
|
);
|
|
expect(skills.length).toBeGreaterThan(0);
|
|
});
|
|
|
|
test('no .claude/skills path leakage in non-root skills', () => {
|
|
if (!fs.existsSync(hostDir)) return; // skip if not generated
|
|
const skills = fs.readdirSync(hostDir);
|
|
for (const skill of skills) {
|
|
// Skip root gstack skill — it contains preamble with intentional .claude/skills
|
|
// fallback paths for binary lookup and skill prefix instructions
|
|
if (skill === 'gstack') continue;
|
|
const skillMd = path.join(hostDir, skill, 'SKILL.md');
|
|
if (!fs.existsSync(skillMd)) continue;
|
|
const content = fs.readFileSync(skillMd, 'utf-8');
|
|
// Strip bash blocks (which have legitimate fallback paths)
|
|
const noBash = content.replace(/```bash\n[\s\S]*?```/g, '');
|
|
const leaks = noBash.split('\n').filter(l => l.includes('.claude/skills'));
|
|
if (leaks.length > 0) {
|
|
throw new Error(`${skill}: .claude/skills leakage:\n${leaks.slice(0, 3).join('\n')}`);
|
|
}
|
|
}
|
|
});
|
|
|
|
test('frontmatter has name and description', () => {
|
|
if (!fs.existsSync(hostDir)) return;
|
|
const skills = fs.readdirSync(hostDir);
|
|
for (const skill of skills) {
|
|
const skillMd = path.join(hostDir, skill, 'SKILL.md');
|
|
if (!fs.existsSync(skillMd)) continue;
|
|
const content = fs.readFileSync(skillMd, 'utf-8');
|
|
expect(content).toMatch(/^---\n/);
|
|
expect(content).toMatch(/^name:\s/m);
|
|
expect(content).toMatch(/^description:\s/m);
|
|
}
|
|
});
|
|
|
|
test('--dry-run freshness check passes', () => {
|
|
const result = Bun.spawnSync(
|
|
['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', hostConfig.name, '--dry-run'],
|
|
{ cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }
|
|
);
|
|
expect(result.exitCode).toBe(0);
|
|
const output = result.stdout.toString();
|
|
expect(output).not.toContain('STALE');
|
|
});
|
|
|
|
if (hostConfig.generation.skipSkills?.includes('codex')) {
|
|
test('/codex skill excluded', () => {
|
|
expect(fs.existsSync(path.join(hostDir, 'gstack-codex', 'SKILL.md'))).toBe(false);
|
|
});
|
|
}
|
|
});
|
|
}
|
|
});
|
|
|
|
// ─── --host all tests ────────────────────────────────────────
|
|
|
|
describe('--host all', () => {
|
|
test('--host all generates for all registered hosts', () => {
|
|
const result = Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'all', '--dry-run'], {
|
|
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const output = result.stdout.toString();
|
|
// All hosts should appear in output
|
|
expect(output).toContain('FRESH: SKILL.md'); // claude
|
|
for (const hostConfig of getExternalHosts()) {
|
|
expect(output).toContain(`FRESH: ${hostConfig.hostSubdir}/skills/`);
|
|
}
|
|
});
|
|
});
|
|
|
|
// ─── Setup script validation ─────────────────────────────────
|
|
// These tests verify the setup script's install layout matches
|
|
// what the generator produces — catching the bug where setup
|
|
// installed Claude-format source dirs for Codex users.
|
|
|
|
describe('setup script validation', () => {
|
|
const setupContent = fs.readFileSync(path.join(ROOT, 'setup'), 'utf-8');
|
|
|
|
test('setup has separate link functions for Claude and Codex', () => {
|
|
expect(setupContent).toContain('link_claude_skill_dirs');
|
|
expect(setupContent).toContain('link_codex_skill_dirs');
|
|
// Old unified function must not exist
|
|
expect(setupContent).not.toMatch(/^link_skill_dirs\(\)/m);
|
|
});
|
|
|
|
test('Claude install uses link_claude_skill_dirs', () => {
|
|
// The Claude install section (section 4) should use the Claude function
|
|
const claudeSection = setupContent.slice(
|
|
setupContent.indexOf('# 4. Install for Claude'),
|
|
setupContent.indexOf('# 5. Install for Codex')
|
|
);
|
|
expect(claudeSection).toContain('link_claude_skill_dirs');
|
|
expect(claudeSection).not.toContain('link_codex_skill_dirs');
|
|
});
|
|
|
|
test('Codex install uses link_codex_skill_dirs', () => {
|
|
// The Codex install section (section 5) should use the Codex function
|
|
const codexSection = setupContent.slice(
|
|
setupContent.indexOf('# 5. Install for Codex'),
|
|
setupContent.indexOf('# 6. Create')
|
|
);
|
|
expect(codexSection).toContain('create_codex_runtime_root');
|
|
expect(codexSection).toContain('link_codex_skill_dirs');
|
|
expect(codexSection).not.toContain('link_claude_skill_dirs');
|
|
expect(codexSection).not.toContain('ln -snf "$GSTACK_DIR" "$CODEX_GSTACK"');
|
|
});
|
|
|
|
test('Codex install prefers repo-local .agents/skills when setup runs from there', () => {
|
|
expect(setupContent).toContain('SKILLS_PARENT_BASENAME');
|
|
expect(setupContent).toContain('CODEX_REPO_LOCAL=0');
|
|
expect(setupContent).toContain('[ "$SKILLS_PARENT_BASENAME" = ".agents" ]');
|
|
expect(setupContent).toContain('CODEX_REPO_LOCAL=1');
|
|
expect(setupContent).toContain('CODEX_SKILLS="$INSTALL_SKILLS_DIR"');
|
|
});
|
|
|
|
test('setup separates install path from source path for symlinked repo-local installs', () => {
|
|
expect(setupContent).toContain('INSTALL_GSTACK_DIR=');
|
|
expect(setupContent).toContain('SOURCE_GSTACK_DIR=');
|
|
expect(setupContent).toContain('INSTALL_SKILLS_DIR=');
|
|
expect(setupContent).toContain('CODEX_GSTACK="$INSTALL_GSTACK_DIR"');
|
|
expect(setupContent).toContain('link_codex_skill_dirs "$SOURCE_GSTACK_DIR" "$CODEX_SKILLS"');
|
|
});
|
|
|
|
test('Codex installs always create sidecar runtime assets for the real skill target', () => {
|
|
expect(setupContent).toContain('if [ "$INSTALL_CODEX" -eq 1 ]; then');
|
|
expect(setupContent).toContain('create_agents_sidecar "$SOURCE_GSTACK_DIR"');
|
|
});
|
|
|
|
test('link_codex_skill_dirs reads from .agents/skills/', () => {
|
|
// The Codex link function must reference .agents/skills for generated Codex skills
|
|
const fnStart = setupContent.indexOf('link_codex_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('linked[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('.agents/skills');
|
|
expect(fnBody).toContain('gstack*');
|
|
});
|
|
|
|
test('link_claude_skill_dirs creates real directories with absolute SKILL.md symlinks', () => {
|
|
// Claude links should be real directories with absolute SKILL.md symlinks
|
|
// to ensure Claude Code discovers them as top-level skills (not nested under gstack/)
|
|
const fnStart = setupContent.indexOf('link_claude_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('linked[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('mkdir -p "$target"');
|
|
expect(fnBody).toContain('ln -snf "$gstack_dir/$dir_name/SKILL.md" "$target/SKILL.md"');
|
|
});
|
|
|
|
// REGRESSION: cleanup functions must handle both old symlinks AND new real-directory pattern
|
|
test('cleanup functions handle real directories with symlinked SKILL.md', () => {
|
|
// cleanup_old_claude_symlinks must detect and remove real dirs with SKILL.md symlinks
|
|
const cleanupOldStart = setupContent.indexOf('cleanup_old_claude_symlinks()');
|
|
const cleanupOldEnd = setupContent.indexOf('}', setupContent.indexOf('cleaned up old', cleanupOldStart));
|
|
const cleanupOldBody = setupContent.slice(cleanupOldStart, cleanupOldEnd);
|
|
expect(cleanupOldBody).toContain('-d "$old_target"');
|
|
expect(cleanupOldBody).toContain('-L "$old_target/SKILL.md"');
|
|
expect(cleanupOldBody).toContain('rm -rf "$old_target"');
|
|
|
|
// cleanup_prefixed_claude_symlinks must also handle the new pattern
|
|
const cleanupPrefixedStart = setupContent.indexOf('cleanup_prefixed_claude_symlinks()');
|
|
const cleanupPrefixedEnd = setupContent.indexOf('}', setupContent.indexOf('cleaned up prefixed', cleanupPrefixedStart));
|
|
const cleanupPrefixedBody = setupContent.slice(cleanupPrefixedStart, cleanupPrefixedEnd);
|
|
expect(cleanupPrefixedBody).toContain('-d "$prefixed_target"');
|
|
expect(cleanupPrefixedBody).toContain('-L "$prefixed_target/SKILL.md"');
|
|
expect(cleanupPrefixedBody).toContain('rm -rf "$prefixed_target"');
|
|
});
|
|
|
|
// REGRESSION: link function must upgrade old directory symlinks
|
|
test('link_claude_skill_dirs removes old directory symlinks before creating real dirs', () => {
|
|
const fnStart = setupContent.indexOf('link_claude_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('linked[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
// Must check for and remove old symlinks before mkdir
|
|
expect(fnBody).toContain('if [ -L "$target" ]');
|
|
expect(fnBody).toContain('rm -f "$target"');
|
|
});
|
|
|
|
test('setup supports --host auto|claude|codex|kiro|opencode', () => {
|
|
expect(setupContent).toContain('--host');
|
|
expect(setupContent).toContain('claude|codex|kiro|factory|opencode|auto');
|
|
});
|
|
|
|
test('auto mode detects claude, codex, kiro, and opencode binaries', () => {
|
|
expect(setupContent).toContain('command -v claude');
|
|
expect(setupContent).toContain('command -v codex');
|
|
expect(setupContent).toContain('command -v kiro-cli');
|
|
expect(setupContent).toContain('command -v opencode');
|
|
});
|
|
|
|
// T1: Sidecar skip guard — prevents .agents/skills/gstack from being linked as a skill
|
|
test('link_codex_skill_dirs skips the gstack sidecar directory', () => {
|
|
const fnStart = setupContent.indexOf('link_codex_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('done', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('[ "$skill_name" = "gstack" ] && continue');
|
|
});
|
|
|
|
// T2: Dynamic $GSTACK_ROOT paths in generated Codex preambles
|
|
test('generated Codex preambles use dynamic GSTACK_ROOT paths', () => {
|
|
const codexSkillDir = path.join(ROOT, '.agents', 'skills', 'gstack-ship');
|
|
if (!fs.existsSync(codexSkillDir)) return; // skip if .agents/ not generated
|
|
const content = fs.readFileSync(path.join(codexSkillDir, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('GSTACK_ROOT=');
|
|
expect(content).toContain('$GSTACK_BIN/');
|
|
});
|
|
|
|
test('setup supports --host kiro with install section and sed rewrites', () => {
|
|
expect(setupContent).toContain('INSTALL_KIRO=');
|
|
expect(setupContent).toContain('kiro-cli');
|
|
expect(setupContent).toContain('KIRO_SKILLS=');
|
|
expect(setupContent).toContain('~/.kiro/skills/gstack');
|
|
});
|
|
|
|
test('setup supports --host opencode with install section and OpenCode skill path vars', () => {
|
|
expect(setupContent).toContain('INSTALL_OPENCODE=');
|
|
expect(setupContent).toContain('OPENCODE_SKILLS="$HOME/.config/opencode/skills"');
|
|
expect(setupContent).toContain('OPENCODE_GSTACK="$OPENCODE_SKILLS/gstack"');
|
|
});
|
|
|
|
test('setup installs OpenCode skills into a nested gstack runtime root', () => {
|
|
expect(setupContent).toContain('create_opencode_runtime_root');
|
|
expect(setupContent).toContain('.opencode/skills');
|
|
expect(setupContent).toContain('review/specialists');
|
|
expect(setupContent).toContain('qa/templates');
|
|
expect(setupContent).toContain('qa/references');
|
|
expect(setupContent).toContain('dx-hall-of-fame.md');
|
|
});
|
|
|
|
test('create_agents_sidecar links runtime assets', () => {
|
|
// Sidecar must link bin, browse, review, qa
|
|
const fnStart = setupContent.indexOf('create_agents_sidecar()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('done', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('bin');
|
|
expect(fnBody).toContain('browse');
|
|
expect(fnBody).toContain('review');
|
|
expect(fnBody).toContain('qa');
|
|
});
|
|
|
|
test('create_codex_runtime_root exposes only runtime assets', () => {
|
|
const fnStart = setupContent.indexOf('create_codex_runtime_root()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('done', setupContent.indexOf('review/', fnStart)));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('gstack/SKILL.md');
|
|
expect(fnBody).toContain('browse/dist');
|
|
expect(fnBody).toContain('browse/bin');
|
|
expect(fnBody).toContain('gstack-upgrade/SKILL.md');
|
|
// Review runtime assets (individual files, not the whole dir)
|
|
expect(fnBody).toContain('checklist.md');
|
|
expect(fnBody).toContain('design-checklist.md');
|
|
expect(fnBody).toContain('greptile-triage.md');
|
|
expect(fnBody).toContain('TODOS-format.md');
|
|
expect(fnBody).not.toContain('ln -snf "$gstack_dir" "$codex_gstack"');
|
|
});
|
|
|
|
test('direct Codex installs are migrated out of ~/.codex/skills/gstack', () => {
|
|
expect(setupContent).toContain('migrate_direct_codex_install');
|
|
expect(setupContent).toContain('$HOME/.gstack/repos/gstack');
|
|
expect(setupContent).toContain('avoid duplicate skill discovery');
|
|
});
|
|
|
|
// --- Symlink prefix tests (PR #503) ---
|
|
|
|
test('link_claude_skill_dirs applies gstack- prefix by default', () => {
|
|
const fnStart = setupContent.indexOf('link_claude_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('linked[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('SKILL_PREFIX');
|
|
expect(fnBody).toContain('link_name="gstack-$skill_name"');
|
|
});
|
|
|
|
test('link_claude_skill_dirs preserves already-prefixed dirs', () => {
|
|
const fnStart = setupContent.indexOf('link_claude_skill_dirs()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('linked[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
// gstack-* dirs should keep their name (e.g., gstack-upgrade stays gstack-upgrade)
|
|
expect(fnBody).toContain('gstack-*) link_name="$skill_name"');
|
|
});
|
|
|
|
test('setup supports --no-prefix flag', () => {
|
|
expect(setupContent).toContain('--no-prefix');
|
|
expect(setupContent).toContain('SKILL_PREFIX=0');
|
|
});
|
|
|
|
test('cleanup_old_claude_symlinks removes only gstack-pointing symlinks', () => {
|
|
expect(setupContent).toContain('cleanup_old_claude_symlinks');
|
|
const fnStart = setupContent.indexOf('cleanup_old_claude_symlinks()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('removed[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
// Should check readlink before removing
|
|
expect(fnBody).toContain('readlink');
|
|
expect(fnBody).toContain('gstack/*');
|
|
// Should skip already-prefixed dirs
|
|
expect(fnBody).toContain('gstack-*) continue');
|
|
});
|
|
|
|
test('cleanup runs before link when prefix is enabled', () => {
|
|
// In the Claude install section, cleanup should happen before linking
|
|
const claudeInstallSection = setupContent.slice(
|
|
setupContent.indexOf('INSTALL_CLAUDE'),
|
|
setupContent.lastIndexOf('link_claude_skill_dirs')
|
|
);
|
|
expect(claudeInstallSection).toContain('cleanup_old_claude_symlinks');
|
|
});
|
|
|
|
// --- Persistent config + interactive prompt tests ---
|
|
|
|
test('setup reads skill_prefix from config', () => {
|
|
expect(setupContent).toContain('get skill_prefix');
|
|
expect(setupContent).toContain('GSTACK_CONFIG');
|
|
});
|
|
|
|
test('setup supports --prefix flag', () => {
|
|
expect(setupContent).toContain('--prefix)');
|
|
expect(setupContent).toContain('SKILL_PREFIX=1; SKILL_PREFIX_FLAG=1');
|
|
});
|
|
|
|
test('--prefix and --no-prefix persist to config', () => {
|
|
expect(setupContent).toContain('set skill_prefix');
|
|
});
|
|
|
|
test('interactive prompt shows when no config', () => {
|
|
expect(setupContent).toContain('Short names');
|
|
expect(setupContent).toContain('Namespaced');
|
|
expect(setupContent).toContain('Choice [1/2]');
|
|
});
|
|
|
|
test('non-TTY defaults to flat names', () => {
|
|
// Should check if stdin is a TTY before prompting
|
|
expect(setupContent).toContain('-t 0');
|
|
});
|
|
|
|
test('cleanup_prefixed_claude_symlinks exists and uses readlink', () => {
|
|
expect(setupContent).toContain('cleanup_prefixed_claude_symlinks');
|
|
const fnStart = setupContent.indexOf('cleanup_prefixed_claude_symlinks()');
|
|
const fnEnd = setupContent.indexOf('}', setupContent.indexOf('removed[@]}', fnStart));
|
|
const fnBody = setupContent.slice(fnStart, fnEnd);
|
|
expect(fnBody).toContain('readlink');
|
|
expect(fnBody).toContain('gstack-$skill_name');
|
|
});
|
|
|
|
test('reverse cleanup runs before link when prefix is disabled', () => {
|
|
const claudeInstallSection = setupContent.slice(
|
|
setupContent.indexOf('INSTALL_CLAUDE'),
|
|
setupContent.lastIndexOf('link_claude_skill_dirs')
|
|
);
|
|
expect(claudeInstallSection).toContain('cleanup_prefixed_claude_symlinks');
|
|
});
|
|
|
|
test('welcome message references SKILL_PREFIX', () => {
|
|
// gstack-upgrade is always called gstack-upgrade (it's the actual dir name)
|
|
// but the welcome section should exist near the prefix logic
|
|
expect(setupContent).toContain('Run /gstack-upgrade anytime');
|
|
});
|
|
});
|
|
|
|
describe('discover-skills hidden directory filtering', () => {
|
|
test('discoverTemplates skips dot-prefixed directories', () => {
|
|
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-discover-'));
|
|
try {
|
|
// Create a hidden dir with a template (should be excluded)
|
|
fs.mkdirSync(path.join(tmpDir, '.hidden'), { recursive: true });
|
|
fs.writeFileSync(path.join(tmpDir, '.hidden', 'SKILL.md.tmpl'), '---\nname: evil\n---\ntest');
|
|
// Create a visible dir with a template (should be included)
|
|
fs.mkdirSync(path.join(tmpDir, 'visible'), { recursive: true });
|
|
fs.writeFileSync(path.join(tmpDir, 'visible', 'SKILL.md.tmpl'), '---\nname: good\n---\ntest');
|
|
|
|
const { discoverTemplates } = require('../scripts/discover-skills');
|
|
const results = discoverTemplates(tmpDir);
|
|
const dirs = results.map((r: { tmpl: string }) => r.tmpl);
|
|
|
|
expect(dirs).toContain('visible/SKILL.md.tmpl');
|
|
expect(dirs).not.toContain('.hidden/SKILL.md.tmpl');
|
|
} finally {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('telemetry', () => {
|
|
test('generated SKILL.md contains telemetry start block', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('_TEL_START');
|
|
expect(content).toContain('_SESSION_ID');
|
|
expect(content).toContain('TELEMETRY:');
|
|
expect(content).toContain('TEL_PROMPTED:');
|
|
expect(content).toContain('gstack-config get telemetry');
|
|
});
|
|
|
|
test('generated SKILL.md contains telemetry opt-in prompt', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('.telemetry-prompted');
|
|
expect(content).toContain('Help gstack get better');
|
|
expect(content).toContain('gstack-config set telemetry community');
|
|
expect(content).toContain('gstack-config set telemetry anonymous');
|
|
expect(content).toContain('gstack-config set telemetry off');
|
|
});
|
|
|
|
test('generated SKILL.md contains telemetry epilogue', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Telemetry (run last)');
|
|
expect(content).toContain('gstack-telemetry-log');
|
|
expect(content).toContain('_TEL_END');
|
|
expect(content).toContain('_TEL_DUR');
|
|
expect(content).toContain('SKILL_NAME');
|
|
expect(content).toContain('OUTCOME');
|
|
expect(content).toContain('PLAN MODE EXCEPTION');
|
|
});
|
|
|
|
test('generated SKILL.md contains pending marker handling', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('.pending');
|
|
expect(content).toContain('_pending_finalize');
|
|
});
|
|
|
|
test('telemetry blocks appear in all skill files that use PREAMBLE', () => {
|
|
const skills = ['qa', 'ship', 'review', 'plan-ceo-review', 'plan-eng-review', 'retro'];
|
|
for (const skill of skills) {
|
|
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
|
if (fs.existsSync(skillPath)) {
|
|
const content = fs.readFileSync(skillPath, 'utf-8');
|
|
expect(content).toContain('_TEL_START');
|
|
expect(content).toContain('Telemetry (run last)');
|
|
}
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('community fixes wave', () => {
|
|
// Helper to get all generated SKILL.md files
|
|
function getAllSkillMds(): Array<{ name: string; content: string }> {
|
|
const results: Array<{ name: string; content: string }> = [];
|
|
const rootPath = path.join(ROOT, 'SKILL.md');
|
|
if (fs.existsSync(rootPath)) {
|
|
results.push({ name: 'root', content: fs.readFileSync(rootPath, 'utf-8') });
|
|
}
|
|
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
|
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
|
const skillPath = path.join(ROOT, entry.name, 'SKILL.md');
|
|
if (fs.existsSync(skillPath)) {
|
|
results.push({ name: entry.name, content: fs.readFileSync(skillPath, 'utf-8') });
|
|
}
|
|
}
|
|
return results;
|
|
}
|
|
|
|
// #594 — Discoverability: every SKILL.md.tmpl description contains "gstack"
|
|
test('every SKILL.md.tmpl description contains "gstack"', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const tmplPath = skill.dir === '.' ? path.join(ROOT, 'SKILL.md.tmpl') : path.join(ROOT, skill.dir, 'SKILL.md.tmpl');
|
|
const content = fs.readFileSync(tmplPath, 'utf-8');
|
|
const desc = extractDescription(content);
|
|
expect(desc.toLowerCase()).toContain('gstack');
|
|
}
|
|
});
|
|
|
|
// #594 — Discoverability: first line of each description is under 120 chars
|
|
test('every SKILL.md.tmpl description first line is under 120 chars', () => {
|
|
for (const skill of ALL_SKILLS) {
|
|
const tmplPath = skill.dir === '.' ? path.join(ROOT, 'SKILL.md.tmpl') : path.join(ROOT, skill.dir, 'SKILL.md.tmpl');
|
|
const content = fs.readFileSync(tmplPath, 'utf-8');
|
|
const desc = extractDescription(content);
|
|
const firstLine = desc.split('\n')[0];
|
|
expect(firstLine.length).toBeLessThanOrEqual(120);
|
|
}
|
|
});
|
|
|
|
// #573 — Feature signals: ship/SKILL.md contains feature signal detection
|
|
test('ship/SKILL.md contains feature signal detection in Step 4', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
|
expect(content.toLowerCase()).toContain('feature signal');
|
|
});
|
|
|
|
// #510 — Context warnings: no SKILL.md contains "running low on context"
|
|
test('no generated SKILL.md contains "running low on context"', () => {
|
|
const skills = getAllSkillMds();
|
|
for (const { name, content } of skills) {
|
|
expect(content).not.toContain('running low on context');
|
|
}
|
|
});
|
|
|
|
// #510 — Context warnings: plan-eng-review has explicit anti-warning
|
|
test('plan-eng-review/SKILL.md contains "Do not preemptively warn"', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Do not preemptively warn');
|
|
});
|
|
|
|
// #474 — Safety Net: no SKILL.md uses find with -delete
|
|
test('no generated SKILL.md contains find with -delete flag', () => {
|
|
const skills = getAllSkillMds();
|
|
for (const { name, content } of skills) {
|
|
// Match find commands that use -delete (but not prose mentioning the word "delete")
|
|
const lines = content.split('\n');
|
|
for (const line of lines) {
|
|
if (line.includes('find ') && line.includes('-delete')) {
|
|
throw new Error(`${name}/SKILL.md contains find with -delete: ${line.trim()}`);
|
|
}
|
|
}
|
|
}
|
|
});
|
|
|
|
// #467 — Telemetry: preamble JSONL writes are gated by telemetry setting
|
|
test('preamble JSONL writes are inside telemetry conditional', () => {
|
|
const preamble = fs.readFileSync(path.join(ROOT, 'scripts/resolvers/preamble.ts'), 'utf-8');
|
|
// Find all skill-usage.jsonl write lines
|
|
const lines = preamble.split('\n');
|
|
for (let i = 0; i < lines.length; i++) {
|
|
if (lines[i].includes('skill-usage.jsonl') && lines[i].includes('>>')) {
|
|
// Look backwards for a telemetry conditional within 5 lines
|
|
let foundConditional = false;
|
|
for (let j = i - 1; j >= Math.max(0, i - 5); j--) {
|
|
if (lines[j].includes('_TEL') && lines[j].includes('off')) {
|
|
foundConditional = true;
|
|
break;
|
|
}
|
|
}
|
|
expect(foundConditional).toBe(true);
|
|
}
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('codex commands must not use inline $(git rev-parse --show-toplevel) for cwd', () => {
|
|
// Regression test: inline $(git rev-parse --show-toplevel) in codex exec -C
|
|
// or codex review without cd evaluates in whatever cwd the background shell
|
|
// inherits, which may be a different project in Conductor workspaces.
|
|
// The fix is to resolve _REPO_ROOT eagerly at the top of each bash block.
|
|
|
|
// Scan all source files that could contain codex commands
|
|
// Use Bun.Glob to avoid ELOOP from .claude/skills/gstack symlink back to ROOT
|
|
const tmplGlob = new Bun.Glob('**/*.tmpl');
|
|
const sourceFiles = [
|
|
...Array.from(tmplGlob.scanSync({ cwd: ROOT, followSymlinks: false })),
|
|
...fs.readdirSync(path.join(ROOT, 'scripts/resolvers'))
|
|
.filter(f => f.endsWith('.ts'))
|
|
.map(f => `scripts/resolvers/${f}`),
|
|
'scripts/gen-skill-docs.ts',
|
|
];
|
|
|
|
test('no codex exec command uses inline $(git rev-parse --show-toplevel) in -C flag', () => {
|
|
const violations: string[] = [];
|
|
for (const rel of sourceFiles) {
|
|
const abs = path.join(ROOT, rel);
|
|
if (!fs.existsSync(abs)) continue;
|
|
const content = fs.readFileSync(abs, 'utf-8');
|
|
const lines = content.split('\n');
|
|
for (let i = 0; i < lines.length; i++) {
|
|
const line = lines[i];
|
|
if (line.includes('codex exec') && line.includes('-C') && line.includes('$(git rev-parse --show-toplevel)')) {
|
|
violations.push(`${rel}:${i + 1}`);
|
|
}
|
|
}
|
|
}
|
|
expect(violations).toEqual([]);
|
|
});
|
|
|
|
test('no generated SKILL.md has codex exec with inline $(git rev-parse --show-toplevel) in -C flag', () => {
|
|
const violations: string[] = [];
|
|
const skillMdGlob = new Bun.Glob('**/SKILL.md');
|
|
const skillMdFiles = Array.from(skillMdGlob.scanSync({ cwd: ROOT, followSymlinks: false }));
|
|
for (const rel of skillMdFiles) {
|
|
const abs = path.join(ROOT, rel);
|
|
if (!fs.existsSync(abs)) continue;
|
|
const content = fs.readFileSync(abs, 'utf-8');
|
|
const lines = content.split('\n');
|
|
for (let i = 0; i < lines.length; i++) {
|
|
const line = lines[i];
|
|
if (line.includes('codex exec') && line.includes('-C') && line.includes('$(git rev-parse --show-toplevel)')) {
|
|
violations.push(`${rel}:${i + 1}`);
|
|
}
|
|
}
|
|
}
|
|
expect(violations).toEqual([]);
|
|
});
|
|
|
|
test('codex review commands must be preceded by cd "$_REPO_ROOT" (no -C support)', () => {
|
|
// codex review does not support -C, so the pattern must be:
|
|
// _REPO_ROOT=$(git rev-parse --show-toplevel) || { ... }
|
|
// cd "$_REPO_ROOT"
|
|
// codex review ...
|
|
// NOT: codex review ... with inline $(git rev-parse --show-toplevel)
|
|
const allFiles = [
|
|
...Array.from(tmplGlob.scanSync({ cwd: ROOT, followSymlinks: false })),
|
|
...Array.from(new Bun.Glob('**/SKILL.md').scanSync({ cwd: ROOT, followSymlinks: false })),
|
|
...fs.readdirSync(path.join(ROOT, 'scripts/resolvers'))
|
|
.filter(f => f.endsWith('.ts'))
|
|
.map(f => `scripts/resolvers/${f}`),
|
|
'scripts/gen-skill-docs.ts',
|
|
];
|
|
const violations: string[] = [];
|
|
for (const rel of allFiles) {
|
|
const abs = path.join(ROOT, rel);
|
|
if (!fs.existsSync(abs)) continue;
|
|
const content = fs.readFileSync(abs, 'utf-8');
|
|
const lines = content.split('\n');
|
|
for (let i = 0; i < lines.length; i++) {
|
|
const line = lines[i];
|
|
// Skip non-executable lines (markdown table cells, prose references)
|
|
if (line.includes('|') && line.includes('`/codex review`')) continue;
|
|
if (line.includes('`codex review`')) continue;
|
|
// Check for codex review with inline $(git rev-parse)
|
|
if (line.includes('codex review') && line.includes('$(git rev-parse --show-toplevel)')) {
|
|
violations.push(`${rel}:${i + 1} — inline git rev-parse in codex review`);
|
|
}
|
|
}
|
|
}
|
|
expect(violations).toEqual([]);
|
|
});
|
|
});
|
|
|
|
// ─── Learnings + Confidence Resolver Tests ─────────────────────
|
|
|
|
describe('LEARNINGS_SEARCH resolver', () => {
|
|
const SEARCH_SKILLS = ['review', 'ship', 'plan-eng-review', 'investigate', 'office-hours', 'plan-ceo-review'];
|
|
|
|
for (const skill of SEARCH_SKILLS) {
|
|
test(`${skill} generated SKILL.md contains learnings search`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Prior Learnings');
|
|
expect(content).toContain('gstack-learnings-search');
|
|
});
|
|
}
|
|
|
|
test('learnings search includes cross-project config check', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('cross_project_learnings');
|
|
expect(content).toContain('--cross-project');
|
|
});
|
|
|
|
test('learnings search includes AskUserQuestion for first-time cross-project opt-in', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Enable cross-project learnings');
|
|
expect(content).toContain('project-scoped only');
|
|
});
|
|
|
|
test('learnings search mentions prior learning applied display format', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Prior learning applied');
|
|
});
|
|
});
|
|
|
|
describe('LEARNINGS_LOG resolver', () => {
|
|
const LOG_SKILLS = ['review', 'retro', 'investigate'];
|
|
|
|
for (const skill of LOG_SKILLS) {
|
|
test(`${skill} generated SKILL.md contains learnings log`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Capture Learnings');
|
|
expect(content).toContain('gstack-learnings-log');
|
|
});
|
|
}
|
|
|
|
test('learnings log documents all type values', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
for (const type of ['pattern', 'pitfall', 'preference', 'architecture', 'tool']) {
|
|
expect(content).toContain(type);
|
|
}
|
|
});
|
|
|
|
test('learnings log documents all source values', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
for (const source of ['observed', 'user-stated', 'inferred', 'cross-model']) {
|
|
expect(content).toContain(source);
|
|
}
|
|
});
|
|
|
|
test('learnings log includes files field for staleness detection', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('"files"');
|
|
expect(content).toContain('staleness detection');
|
|
});
|
|
});
|
|
|
|
describe('CONFIDENCE_CALIBRATION resolver', () => {
|
|
const CONFIDENCE_SKILLS = ['review', 'ship', 'plan-eng-review', 'cso'];
|
|
|
|
for (const skill of CONFIDENCE_SKILLS) {
|
|
test(`${skill} generated SKILL.md contains confidence calibration`, () => {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Confidence Calibration');
|
|
expect(content).toContain('confidence score');
|
|
});
|
|
}
|
|
|
|
test('confidence calibration includes scoring rubric with all tiers', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('9-10');
|
|
expect(content).toContain('7-8');
|
|
expect(content).toContain('5-6');
|
|
expect(content).toContain('3-4');
|
|
expect(content).toContain('1-2');
|
|
});
|
|
|
|
test('confidence calibration includes display rules', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('Show normally');
|
|
expect(content).toContain('Suppress from main report');
|
|
});
|
|
|
|
test('confidence calibration includes finding format example', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('[P1] (confidence:');
|
|
expect(content).toContain('SQL injection');
|
|
});
|
|
|
|
test('confidence calibration includes calibration learning feedback loop', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('calibration event');
|
|
expect(content).toContain('Log the corrected pattern');
|
|
});
|
|
|
|
test('skills without confidence calibration do NOT contain it', () => {
|
|
// office-hours and retro do NOT use confidence calibration
|
|
for (const skill of ['office-hours', 'retro']) {
|
|
const content = fs.readFileSync(path.join(ROOT, skill, 'SKILL.md'), 'utf-8');
|
|
expect(content).not.toContain('## Confidence Calibration');
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('gen-skill-docs prefix warning (#620/#578)', () => {
|
|
const { execSync } = require('child_process');
|
|
|
|
test('warns about skill_prefix when config has prefix=true', () => {
|
|
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-prefix-warn-'));
|
|
try {
|
|
// Create a fake ~/.gstack/config.yaml with skill_prefix: true
|
|
const fakeHome = tmpDir;
|
|
const fakeGstack = path.join(fakeHome, '.gstack');
|
|
fs.mkdirSync(fakeGstack, { recursive: true });
|
|
fs.writeFileSync(path.join(fakeGstack, 'config.yaml'), 'skill_prefix: true\n');
|
|
|
|
const output = execSync('bun run scripts/gen-skill-docs.ts', {
|
|
cwd: ROOT,
|
|
env: { ...process.env, HOME: fakeHome },
|
|
encoding: 'utf-8',
|
|
timeout: 30000,
|
|
});
|
|
expect(output).toContain('skill_prefix is true');
|
|
expect(output).toContain('gstack-relink');
|
|
} finally {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
}
|
|
});
|
|
|
|
test('no warning when skill_prefix is false or absent', () => {
|
|
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-prefix-warn-'));
|
|
try {
|
|
const fakeHome = tmpDir;
|
|
const fakeGstack = path.join(fakeHome, '.gstack');
|
|
fs.mkdirSync(fakeGstack, { recursive: true });
|
|
fs.writeFileSync(path.join(fakeGstack, 'config.yaml'), 'skill_prefix: false\n');
|
|
|
|
const output = execSync('bun run scripts/gen-skill-docs.ts', {
|
|
cwd: ROOT,
|
|
env: { ...process.env, HOME: fakeHome },
|
|
encoding: 'utf-8',
|
|
timeout: 30000,
|
|
});
|
|
expect(output).not.toContain('skill_prefix is true');
|
|
} finally {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
}
|
|
});
|
|
});
|
|
|
|
describe('voice-triggers processing', () => {
|
|
const { extractVoiceTriggers, processVoiceTriggers } = require('../scripts/gen-skill-docs') as {
|
|
extractVoiceTriggers: (content: string) => string[];
|
|
processVoiceTriggers: (content: string) => string;
|
|
};
|
|
|
|
test('extractVoiceTriggers parses valid YAML list', () => {
|
|
const content = `---\nname: cso\ndescription: |\n Security audit.\nvoice-triggers:\n - "see-so"\n - "security review"\n---\nBody`;
|
|
const triggers = extractVoiceTriggers(content);
|
|
expect(triggers).toEqual(['see-so', 'security review']);
|
|
});
|
|
|
|
test('extractVoiceTriggers returns [] when no field present', () => {
|
|
const content = `---\nname: qa\ndescription: |\n QA testing.\n---\nBody`;
|
|
expect(extractVoiceTriggers(content)).toEqual([]);
|
|
});
|
|
|
|
test('processVoiceTriggers appends voice triggers to description', () => {
|
|
const content = `---\nname: cso\ndescription: |\n Security audit. (gstack)\nvoice-triggers:\n - "see-so"\n - "security review"\n---\nBody`;
|
|
const result = processVoiceTriggers(content);
|
|
expect(result).toContain('Voice triggers (speech-to-text aliases): "see-so", "security review".');
|
|
});
|
|
|
|
test('processVoiceTriggers strips voice-triggers field from output', () => {
|
|
const content = `---\nname: cso\ndescription: |\n Security audit. (gstack)\nvoice-triggers:\n - "see-so"\n---\nBody`;
|
|
const result = processVoiceTriggers(content);
|
|
expect(result).not.toContain('voice-triggers:');
|
|
});
|
|
|
|
test('processVoiceTriggers returns content unchanged when no voice-triggers', () => {
|
|
const content = `---\nname: qa\ndescription: |\n QA testing.\n---\nBody`;
|
|
expect(processVoiceTriggers(content)).toBe(content);
|
|
});
|
|
|
|
test('generated CSO SKILL.md contains voice triggers in description', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'cso', 'SKILL.md'), 'utf-8');
|
|
expect(content).toContain('"see-so"');
|
|
expect(content).toContain('Voice triggers (speech-to-text aliases):');
|
|
});
|
|
|
|
test('generated CSO SKILL.md does NOT contain raw voice-triggers field', () => {
|
|
const content = fs.readFileSync(path.join(ROOT, 'cso', 'SKILL.md'), 'utf-8');
|
|
const fmEnd = content.indexOf('\n---', 4);
|
|
const frontmatter = content.slice(0, fmEnd);
|
|
expect(frontmatter).not.toContain('voice-triggers:');
|
|
});
|
|
});
|