Files
gstack/test/relink.test.ts
Garry Tan 8ca950f6f1 feat: content security — 4-layer prompt injection defense for pair-agent (#815)
* feat: token registry for multi-agent browser access

Per-agent scoped tokens with read/write/admin/meta command categories,
domain glob restrictions, rate limiting, expiry, and revocation. Setup
key exchange for the /pair-agent ceremony (5-min one-time key → 24h
session token). Idempotent exchange handles tunnel drops. 39 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate token registry + scoped auth into browse server

Server changes for multi-agent browser access:
- /connect endpoint: setup key exchange for /pair-agent ceremony
- /token endpoint: root-only minting of scoped sub-tokens
- /token/:clientId DELETE: revoke agent tokens
- /agents endpoint: list connected agents (root-only)
- /health: strips root token when tunnel is active (P0 security fix)
- /command: scope/rate/domain checks via token registry before dispatch
- Idle timer skips shutdown when tunnel is active

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ngrok tunnel integration + @ngrok/ngrok dependency

BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve().
Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads
NGROK_DOMAIN for dedicated domain (stable URL). Updates state
file with tunnel URL. Feasibility spike confirmed: SDK works in
compiled Bun binary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tab isolation for multi-agent browser access

Add per-tab ownership tracking to BrowserManager. Scoped agents
must create their own tab via newtab before writing. Unowned tabs
(pre-existing, user-opened) are root-only for writes. Read access
always allowed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tab enforcement + POST /pair endpoint + activity attribution

Server-side tab ownership check blocks scoped agents from writing to
unowned tabs. Special-case newtab records ownership for scoped tokens.
POST /pair endpoint creates setup keys for the pairing ceremony.
Activity events now include clientId for attribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pair-agent CLI command + instruction block generator

One command to pair a remote agent: $B pair-agent. Creates a setup
key via POST /pair, prints a copy-pasteable instruction block with
curl commands. Smart tunnel fallback (tunnel URL > auto-start >
localhost). Flags: --for HOST, --local HOST, --admin, --client NAME.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: tab isolation + instruction block generator tests

14 tests covering tab ownership lifecycle (access checks, unowned
tabs, transferTab) and instruction block generator (scopes, URLs,
admin flag, troubleshooting section). Fix server-auth test that
used fragile sliceBetween boundaries broken by new endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CSO security fixes — token leak, domain bypass, input validation

1. Remove root token from /health endpoint entirely (CSO #1 CRITICAL).
   Origin header is spoofable. Extension reads from ~/.gstack/.auth.json.
2. Add domain check for newtab URL (CSO #5). Previously only goto was
   checked, allowing domain-restricted agents to bypass via newtab.
3. Validate scope values, rateLimit, expiresSeconds in createToken()
   (CSO #4). Rejects invalid scopes and negative values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /pair-agent skill — syntactic sugar for browser sharing

Users remember /pair-agent, not $B pair-agent. The skill walks through
agent selection (OpenClaw, Hermes, Codex, Cursor, generic), local vs
remote setup, tunnel configuration, and includes platform-specific
notes for each agent type. Wraps the CLI command with context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remote browser access reference for paired agents

Full API reference, snapshot→@ref pattern, scopes, tab isolation,
error codes, ngrok setup, and same-machine shortcuts. The instruction
block points here for deeper reading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: improved instruction block with snapshot→@ref pattern

The paste-into-agent instruction block now teaches the snapshot→@ref
workflow (the most powerful browsing pattern), shows the server URL
prominently, and uses clearer formatting. Tests updated to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart ngrok detection + auto-tunnel in pair-agent

The pair-agent command now checks ngrok's native config (not just
~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is
available. The skill template walks users through ngrok install
and auth if not set up, instead of just printing a dead localhost
URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: on-demand tunnel start via POST /tunnel/start

pair-agent now auto-starts the ngrok tunnel without restarting the
server. New POST /tunnel/start endpoint reads authtoken from env,
~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok
availability and calls the endpoint automatically. Zero manual steps
when ngrok is installed and authed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pair-agent skill must output the instruction block verbatim

Added CRITICAL instruction: the agent MUST output the full instruction
block so the user can copy it. Previously the agent could summarize
over it, leaving the user with nothing to paste.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: scoped tokens rejected on /command — auth gate ordering bug

The blanket validateAuth() gate (root-only) sat above the /command
endpoint, rejecting all scoped tokens with 401 before they reached
getTokenInfo(). Moved /command above the gate so both root and
scoped tokens are accepted. This was the bug Wintermute hit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pair-agent auto-launches headed mode before pairing

When pair-agent detects headless mode, it auto-switches to headed
(visible Chromium window) so the user can watch what the remote
agent does. Use --headless to skip this. Fixed compiled binary
path resolution (process.execPath, not process.argv[1] which is
virtual /$bunfs/ in Bun compiled binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode

16 new tests covering:
- /command sits above blanket auth gate (Wintermute bug)
- /command uses getTokenInfo not validateAuth
- /tunnel/start requires root, checks native ngrok config, returns already_active
- /pair creates setup keys not session tokens
- Tab ownership checked before command dispatch
- Activity events include clientId
- Instruction block teaches snapshot→@ref pattern
- pair-agent auto-headed mode, process.execPath, --headless skip
- isNgrokAvailable checks all 3 sources (gstack env, env var, native config)
- handlePairAgent calls /tunnel/start not server restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: chain scope bypass + /health info leak when tunneled

1. Chain command now pre-validates ALL subcommand scopes before
   executing any. A read+meta token can no longer escalate to
   admin via chain (eval, js, cookies were dispatched without
   scope checks). tokenInfo flows through handleMetaCommand into
   the chain handler. Rejects entire chain if any subcommand fails.

2. /health strips sensitive fields (currentUrl, agent.currentMessage,
   session) when tunnel is active. Only operational metadata (status,
   mode, uptime, tabs) exposed to the internet. Previously anyone
   reaching the ngrok URL could surveil browsing activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: tout /pair-agent as headline feature in CHANGELOG + README

Lead with what it does for the user: type /pair-agent, paste into
your other agent, done. First time AI agents from different companies
can coordinate through a shared browser with real security boundaries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: expand /pair-agent, /design-shotgun, /design-html in README

Each skill gets a real narrative paragraph explaining the workflow,
not just a table cell. design-shotgun: visual exploration with taste
memory. design-html: production HTML with Pretext computed layout.
pair-agent: cross-vendor AI agent coordination through shared browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: split handleCommand into handleCommandInternal + HTTP wrapper

Chain subcommands now route through handleCommandInternal for full security
enforcement (scope, domain, tab ownership, rate limiting, content wrapping).
Adds recursion guard for nested chains, rate-limit exemption for chain
subcommands, and activity event suppression (1 event per chain, not per sub).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add content-security.ts with datamarking, envelope, and filter hooks

Four-layer prompt injection defense for pair-agent browser sharing:
- Datamarking: session-scoped watermark for text exfiltration detection
- Content envelope: trust boundary wrapping with ZWSP marker escaping
- Content filter hooks: extensible filter pipeline with warn/block modes
- Built-in URL blocklist: requestbin, pipedream, webhook.site, etc.

BROWSE_CONTENT_FILTER env var controls mode: off|warn|block (default: warn)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: centralize content wrapping in handleCommandInternal response path

Single wrapping location replaces fragmented per-handler wrapping:
- Scoped tokens: content filters + datamarking + enhanced envelope
- Root tokens: existing basic wrapping (backward compat)
- Chain subcommands exempt from top-level wrapping (wrapped individually)
- Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: hidden element stripping for scoped token text extraction

Detects CSS-hidden elements (opacity, font-size, off-screen, same-color,
clip-path) and ARIA label injection patterns. Marks elements with
data-gstack-hidden, extracts text from a clean clone (no DOM mutation),
then removes markers. Only active for scoped tokens on text command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: snapshot split output format for scoped tokens

Scoped tokens get a split snapshot: trusted @refs section (for click/fill)
separated from untrusted web content in an envelope. Ref names truncated
to 50 chars in trusted section. Root tokens unchanged (backward compat).
Resume command also uses split format for scoped tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add SECURITY section to pair-agent instruction block

Instructs remote agents to treat content inside untrusted envelopes
as potentially malicious. Lists common injection phrases to watch for.
Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS
section, not from page content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 4 prompt injection test fixtures

- injection-visible.html: visible injection in product review text
- injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive
- injection-social.html: social engineering in legitimate-looking content
- injection-combined.html: all attack types + envelope escape attempt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive content security tests (47 tests)

Covers all 4 defense layers:
- Datamarking: marker format, session consistency, text-only application
- Content envelope: wrapping, ZWSP marker escaping, filter warnings
- Content filter hooks: URL blocklist, custom filters, warn/block modes
- Instruction block: SECURITY section content, ordering, generation
- Centralized wrapping: source-level verification of integration
- Chain security: recursion guard, rate-limit exemption, activity suppression
- Hidden element stripping: 7 CSS techniques, ARIA injection, false positives
- Snapshot split format: scoped vs root output, resume integration

Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pair-agent skill compliance + fix all 16 pre-existing test failures

Root cause: pair-agent was added without completing the gen-skill-docs
compliance checklist. All 16 failures traced back to this.

Fixes:
- Sync package.json version to VERSION (0.15.9.0)
- Add "(gstack)" to pair-agent description for discoverability
- Add pair-agent to Codex path exception (legitimately documents ~/.codex/)
- Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist
- Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.)
- Update golden file baselines for ship skill
- Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they
  use the fast mock install instead of scanning real ~/.claude/skills/gstack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.12.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: E2E exit reason precedence + worktree prune race condition

Two fixes for E2E test reliability:

1. session-runner.ts: error_max_turns was misclassified as error_api
   because is_error flag was checked before subtype. Now known subtypes
   like error_max_turns are preserved even when is_error is set. The
   is_error override only applies when subtype=success (API failure).

2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid
   deleting worktrees from concurrent test runs still in progress.
   Previously any second test execution would kill the first's worktrees.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore token in /health for localhost extension auth

The CSO security fix stripped the token from /health to prevent leaking
when tunneled. But the extension needs it to authenticate on localhost.
Now returns token only when not tunneled (safe: localhost-only path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: verify /health token is localhost-only, never served through tunnel

Updated tests to match the restored token behavior:
- Test 1: token assignment exists AND is inside the !tunnelActive guard
- Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add security rationale for token in /health on localhost

Explains why this is an accepted risk (no escalation over file-based
token access), CORS protection, and tunnel guard. Prevents future
CSO scans from stripping it without providing an alternative auth path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: verify tunnel is alive before returning URL to pair-agent

Root cause: when ngrok dies externally (pkill, crash, timeout), the server
still reports tunnelActive=true with a dead URL. pair-agent prints an
instruction block pointing at a dead tunnel. The remote agent gets
"endpoint offline" and the user has to manually restart everything.

Three-layer fix:
- Server /pair endpoint: probes tunnel URL before returning it. If dead,
  resets tunnelActive/tunnelUrl and returns null (triggers CLI restart).
- Server /tunnel/start: probes cached tunnel before returning already_active.
  If dead, falls through to restart ngrok automatically.
- CLI pair-agent: double-checks tunnel URL from server before printing
  instruction block. Falls through to auto-start on failure.

4 regression tests verify all three probe points + CLI verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add POST /batch endpoint for multi-command batching

Remote agents controlling GStack Browser through a tunnel pay 2-5s of
latency per HTTP round-trip. A typical "navigate and read" takes 4
sequential commands = 10-20 seconds. The /batch endpoint collapses N
commands into a single HTTP round-trip, cutting a 20-tab crawl from
~60s to ~5s.

Sequential execution through the full security pipeline (scope, domain,
tab ownership, content wrapping). Rate limiting counts the batch as 1
request. Activity events emitted at batch level, not per-command.
Max 50 commands per batch. Nested batches rejected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add source-level security tests for /batch endpoint

8 tests verifying: auth gate placement, scoped token support, max
command limit, nested batch rejection, rate limiting bypass, batch-level
activity events, command field validation, and tabId passthrough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct CHANGELOG date from 2026-04-06 to 2026-04-05

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate Hermes into generic HTTP option in pair-agent

Hermes doesn't have a host-specific config — it uses the same generic
curl instructions as any other agent. Removing the dedicated option
simplifies the menu and eliminates a misleading distinction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump VERSION to 0.15.14.0, add CHANGELOG entry for batch endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate pair-agent/SKILL.md after main merge

Vendoring deprecation section from main's template wasn't reflected
in the generated file. Fixes check-freshness CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: checkTabAccess uses options object, add own-only tab policy

Refactors checkTabAccess(tabId, clientId, isWrite) to use an options
object { isWrite?, ownOnly? }. Adds tabPolicy === 'own-only' support
in the server command dispatch — scoped tokens with this policy are
restricted to their own tabs for all commands, not just writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --domain flag to pair-agent CLI for domain restrictions

Allows passing --domain to pair-agent to restrict the remote agent's
navigation to specific domains (comma-separated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: remove batch commands CHANGELOG entry and VERSION bump

The batch endpoint work belongs on the browser-batch-multitab branch
(port-louis), not this branch. Reverting VERSION to 0.15.14.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adopt main's headed-mode /health token serving

Our merge kept the old !tunnelActive guard which conflicted with
main's security-audit-r2 tests that require no currentUrl/currentMessage
in /health. Adopts main's approach: serve token conditionally based on
headed mode or chrome-extension origin. Updates server-auth tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve snapshot flags docs completeness for LLM judge

Adds $B placeholder explanation, explicit syntax line, and detailed
flag behavior (-d depth values, -s CSS selector syntax, -D unified
diff format and baseline persistence, -a screenshot vs text output
relationship). Fixes snapshot flags reference LLM eval scoring
completeness < 4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 14:41:06 -07:00

516 lines
22 KiB
TypeScript

import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import { execSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
const ROOT = path.resolve(import.meta.dir, '..');
const BIN = path.join(ROOT, 'bin');
let tmpDir: string;
let skillsDir: string;
let installDir: string;
function run(cmd: string, env: Record<string, string> = {}, expectFail = false): string {
try {
return execSync(cmd, {
cwd: ROOT,
env: { ...process.env, GSTACK_STATE_DIR: tmpDir, ...env },
encoding: 'utf-8',
timeout: 10000,
stdio: ['pipe', 'pipe', 'pipe'],
}).trim();
} catch (e: any) {
if (expectFail) return (e.stderr || e.stdout || '').toString().trim();
throw e;
}
}
// Create a mock gstack install directory with skill subdirs
function setupMockInstall(skills: string[]): void {
installDir = path.join(tmpDir, 'gstack-install');
skillsDir = path.join(tmpDir, 'skills');
fs.mkdirSync(installDir, { recursive: true });
fs.mkdirSync(skillsDir, { recursive: true });
// Copy the real gstack-config and gstack-relink to the mock install
const mockBin = path.join(installDir, 'bin');
fs.mkdirSync(mockBin, { recursive: true });
fs.copyFileSync(path.join(BIN, 'gstack-config'), path.join(mockBin, 'gstack-config'));
fs.chmodSync(path.join(mockBin, 'gstack-config'), 0o755);
if (fs.existsSync(path.join(BIN, 'gstack-relink'))) {
fs.copyFileSync(path.join(BIN, 'gstack-relink'), path.join(mockBin, 'gstack-relink'));
fs.chmodSync(path.join(mockBin, 'gstack-relink'), 0o755);
}
if (fs.existsSync(path.join(BIN, 'gstack-patch-names'))) {
fs.copyFileSync(path.join(BIN, 'gstack-patch-names'), path.join(mockBin, 'gstack-patch-names'));
fs.chmodSync(path.join(mockBin, 'gstack-patch-names'), 0o755);
}
// Create mock skill directories with proper frontmatter
for (const skill of skills) {
fs.mkdirSync(path.join(installDir, skill), { recursive: true });
fs.writeFileSync(
path.join(installDir, skill, 'SKILL.md'),
`---\nname: ${skill}\ndescription: test\n---\n# ${skill}`
);
}
}
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-relink-test-'));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
describe('gstack-relink (#578)', () => {
// Test 11: prefixed symlinks when skill_prefix=true
test('creates gstack-* symlinks when skill_prefix=true', () => {
setupMockInstall(['qa', 'ship', 'review']);
// Set config to prefix mode (pass install/skills env so auto-relink uses mock install)
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Run relink with env pointing to the mock install
const output = run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Verify gstack-* symlinks exist
expect(fs.existsSync(path.join(skillsDir, 'gstack-qa'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'gstack-ship'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'gstack-review'))).toBe(true);
expect(output).toContain('gstack-');
});
// Test 12: flat symlinks when skill_prefix=false
test('creates flat symlinks when skill_prefix=false', () => {
setupMockInstall(['qa', 'ship', 'review']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
const output = run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
expect(fs.existsSync(path.join(skillsDir, 'qa'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'ship'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'review'))).toBe(true);
expect(output).toContain('flat');
});
// REGRESSION: unprefixed skills must be real directories, not symlinks (#761)
// Claude Code auto-prefixes skills nested under a parent dir symlink.
// e.g., `qa -> gstack/qa` gets discovered as "gstack-qa", not "qa".
// The fix: create real directories with SKILL.md symlinks inside.
test('unprefixed skills are real directories with SKILL.md symlinks, not dir symlinks', () => {
setupMockInstall(['qa', 'ship', 'review', 'plan-ceo-review']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
for (const skill of ['qa', 'ship', 'review', 'plan-ceo-review']) {
const skillPath = path.join(skillsDir, skill);
const skillMdPath = path.join(skillPath, 'SKILL.md');
// Must be a real directory, NOT a symlink
expect(fs.lstatSync(skillPath).isDirectory()).toBe(true);
expect(fs.lstatSync(skillPath).isSymbolicLink()).toBe(false);
// Must contain a SKILL.md that IS a symlink
expect(fs.existsSync(skillMdPath)).toBe(true);
expect(fs.lstatSync(skillMdPath).isSymbolicLink()).toBe(true);
// The SKILL.md symlink must point to the source skill's SKILL.md
const target = fs.readlinkSync(skillMdPath);
expect(target).toContain(skill);
expect(target).toEndWith('/SKILL.md');
}
});
// Same invariant for prefixed mode
test('prefixed skills are real directories with SKILL.md symlinks, not dir symlinks', () => {
setupMockInstall(['qa', 'ship']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
for (const skill of ['gstack-qa', 'gstack-ship']) {
const skillPath = path.join(skillsDir, skill);
const skillMdPath = path.join(skillPath, 'SKILL.md');
expect(fs.lstatSync(skillPath).isDirectory()).toBe(true);
expect(fs.lstatSync(skillPath).isSymbolicLink()).toBe(false);
expect(fs.lstatSync(skillMdPath).isSymbolicLink()).toBe(true);
}
});
// Upgrade: old directory symlinks get replaced with real directories
test('upgrades old directory symlinks to real directories', () => {
setupMockInstall(['qa', 'ship']);
// Simulate old behavior: create directory symlinks (the old pattern)
fs.symlinkSync(path.join(installDir, 'qa'), path.join(skillsDir, 'qa'));
fs.symlinkSync(path.join(installDir, 'ship'), path.join(skillsDir, 'ship'));
// Verify they start as symlinks
expect(fs.lstatSync(path.join(skillsDir, 'qa')).isSymbolicLink()).toBe(true);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// After relink: must be real directories, not symlinks
expect(fs.lstatSync(path.join(skillsDir, 'qa')).isSymbolicLink()).toBe(false);
expect(fs.lstatSync(path.join(skillsDir, 'qa')).isDirectory()).toBe(true);
expect(fs.lstatSync(path.join(skillsDir, 'qa', 'SKILL.md')).isSymbolicLink()).toBe(true);
});
// FIRST INSTALL: --no-prefix must create ONLY flat names, zero gstack-* pollution
test('first install --no-prefix: only flat names exist, zero gstack-* entries', () => {
setupMockInstall(['qa', 'ship', 'review', 'plan-ceo-review', 'gstack-upgrade']);
// Simulate first install: no saved config, pass --no-prefix equivalent
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Enumerate everything in skills dir
const entries = fs.readdirSync(skillsDir);
// Expected: qa, ship, review, plan-ceo-review, gstack-upgrade (its real name)
expect(entries.sort()).toEqual(['gstack-upgrade', 'plan-ceo-review', 'qa', 'review', 'ship']);
// No gstack-qa, gstack-ship, gstack-review, gstack-plan-ceo-review
const leaked = entries.filter(e => e.startsWith('gstack-') && e !== 'gstack-upgrade');
expect(leaked).toEqual([]);
});
// FIRST INSTALL: --prefix must create ONLY gstack-* names, zero flat-name pollution
test('first install --prefix: only gstack-* entries exist, zero flat names', () => {
setupMockInstall(['qa', 'ship', 'review', 'plan-ceo-review', 'gstack-upgrade']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
const entries = fs.readdirSync(skillsDir);
// Expected: gstack-qa, gstack-ship, gstack-review, gstack-plan-ceo-review, gstack-upgrade
expect(entries.sort()).toEqual([
'gstack-plan-ceo-review', 'gstack-qa', 'gstack-review', 'gstack-ship', 'gstack-upgrade',
]);
// No unprefixed qa, ship, review, plan-ceo-review
const leaked = entries.filter(e => !e.startsWith('gstack-'));
expect(leaked).toEqual([]);
});
// FIRST INSTALL: non-TTY (no saved config, piped stdin) defaults to flat names
test('non-TTY first install defaults to flat names via relink', () => {
setupMockInstall(['qa', 'ship']);
// Don't set any config — simulate fresh install
// gstack-relink reads config; on fresh install config returns empty → defaults to false
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
const entries = fs.readdirSync(skillsDir);
// Should be flat names (relink defaults to false when config returns empty)
expect(entries.sort()).toEqual(['qa', 'ship']);
});
// SWITCH: prefix → no-prefix must clean up ALL gstack-* entries
test('switching prefix to no-prefix removes all gstack-* entries completely', () => {
setupMockInstall(['qa', 'ship', 'review', 'plan-ceo-review', 'gstack-upgrade']);
// Start in prefix mode
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
let entries = fs.readdirSync(skillsDir);
expect(entries.filter(e => !e.startsWith('gstack-'))).toEqual([]);
// Switch to no-prefix
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
entries = fs.readdirSync(skillsDir);
// Only flat names + gstack-upgrade (its real name)
expect(entries.sort()).toEqual(['gstack-upgrade', 'plan-ceo-review', 'qa', 'review', 'ship']);
const leaked = entries.filter(e => e.startsWith('gstack-') && e !== 'gstack-upgrade');
expect(leaked).toEqual([]);
});
// SWITCH: no-prefix → prefix must clean up ALL flat entries
test('switching no-prefix to prefix removes all flat entries completely', () => {
setupMockInstall(['qa', 'ship', 'review', 'gstack-upgrade']);
// Start in no-prefix mode
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
let entries = fs.readdirSync(skillsDir);
expect(entries.filter(e => e.startsWith('gstack-') && e !== 'gstack-upgrade')).toEqual([]);
// Switch to prefix
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
entries = fs.readdirSync(skillsDir);
// Only gstack-* names
expect(entries.sort()).toEqual([
'gstack-qa', 'gstack-review', 'gstack-ship', 'gstack-upgrade',
]);
const leaked = entries.filter(e => !e.startsWith('gstack-'));
expect(leaked).toEqual([]);
});
// Test 13: cleans stale symlinks from opposite mode
test('cleans up stale symlinks from opposite mode', () => {
setupMockInstall(['qa', 'ship']);
// Create prefixed symlinks first
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
expect(fs.existsSync(path.join(skillsDir, 'gstack-qa'))).toBe(true);
// Switch to flat mode
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Flat symlinks should exist, prefixed should be gone
expect(fs.existsSync(path.join(skillsDir, 'qa'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'gstack-qa'))).toBe(false);
});
// Test 14: error when install dir missing
test('prints error when install dir missing', () => {
const output = run(`${BIN}/gstack-relink`, {
GSTACK_INSTALL_DIR: '/nonexistent/path/gstack',
GSTACK_SKILLS_DIR: '/nonexistent/path/skills',
}, true);
expect(output).toContain('setup');
});
// Test: gstack-upgrade does NOT get double-prefixed
test('does not double-prefix gstack-upgrade directory', () => {
setupMockInstall(['qa', 'ship', 'gstack-upgrade']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// gstack-upgrade should keep its name, NOT become gstack-gstack-upgrade
expect(fs.existsSync(path.join(skillsDir, 'gstack-upgrade'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'gstack-gstack-upgrade'))).toBe(false);
// Regular skills still get prefixed
expect(fs.existsSync(path.join(skillsDir, 'gstack-qa'))).toBe(true);
});
// Test 15: gstack-config set skill_prefix triggers relink
test('gstack-config set skill_prefix triggers relink', () => {
setupMockInstall(['qa', 'ship']);
// Run gstack-config set which should auto-trigger relink
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// If relink was triggered, symlinks should exist
expect(fs.existsSync(path.join(skillsDir, 'gstack-qa'))).toBe(true);
expect(fs.existsSync(path.join(skillsDir, 'gstack-ship'))).toBe(true);
});
});
describe('upgrade migrations', () => {
const MIGRATIONS_DIR = path.join(ROOT, 'gstack-upgrade', 'migrations');
test('migrations directory exists', () => {
expect(fs.existsSync(MIGRATIONS_DIR)).toBe(true);
});
test('all migration scripts are executable and parse without syntax errors', () => {
const scripts = fs.readdirSync(MIGRATIONS_DIR).filter(f => f.endsWith('.sh'));
expect(scripts.length).toBeGreaterThan(0);
for (const script of scripts) {
const fullPath = path.join(MIGRATIONS_DIR, script);
// Must be executable
const stat = fs.statSync(fullPath);
expect(stat.mode & 0o111).toBeGreaterThan(0);
// Must parse without syntax errors (bash -n is a syntax check, doesn't execute)
const result = execSync(`bash -n "${fullPath}" 2>&1`, { encoding: 'utf-8', timeout: 5000 });
// bash -n outputs nothing on success
}
});
test('migration filenames follow v{VERSION}.sh pattern', () => {
const scripts = fs.readdirSync(MIGRATIONS_DIR).filter(f => f.endsWith('.sh'));
for (const script of scripts) {
expect(script).toMatch(/^v\d+\.\d+\.\d+\.\d+\.sh$/);
}
});
test('v0.15.2.0 migration runs gstack-relink', () => {
const content = fs.readFileSync(path.join(MIGRATIONS_DIR, 'v0.15.2.0.sh'), 'utf-8');
expect(content).toContain('gstack-relink');
});
test('v0.15.2.0 migration fixes stale directory symlinks', () => {
setupMockInstall(['qa', 'ship', 'review']);
// Simulate old state: directory symlinks (pre-v0.15.2.0 pattern)
fs.symlinkSync(path.join(installDir, 'qa'), path.join(skillsDir, 'qa'));
fs.symlinkSync(path.join(installDir, 'ship'), path.join(skillsDir, 'ship'));
fs.symlinkSync(path.join(installDir, 'review'), path.join(skillsDir, 'review'));
// Set no-prefix mode (suppress auto-relink so symlinks stay intact for the test)
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_SETUP_RUNNING: '1',
});
// Verify old state: symlinks
expect(fs.lstatSync(path.join(skillsDir, 'qa')).isSymbolicLink()).toBe(true);
// Run the migration (it calls gstack-relink internally)
run(`bash ${path.join(MIGRATIONS_DIR, 'v0.15.2.0.sh')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// After migration: real directories with SKILL.md symlinks
for (const skill of ['qa', 'ship', 'review']) {
const skillPath = path.join(skillsDir, skill);
expect(fs.lstatSync(skillPath).isSymbolicLink()).toBe(false);
expect(fs.lstatSync(skillPath).isDirectory()).toBe(true);
expect(fs.lstatSync(path.join(skillPath, 'SKILL.md')).isSymbolicLink()).toBe(true);
}
});
});
describe('gstack-patch-names (#620/#578)', () => {
// Helper to read name: from SKILL.md frontmatter
function readSkillName(skillDir: string): string | null {
const content = fs.readFileSync(path.join(skillDir, 'SKILL.md'), 'utf-8');
const match = content.match(/^name:\s*(.+)$/m);
return match ? match[1].trim() : null;
}
test('prefix=true patches name: field in SKILL.md', () => {
setupMockInstall(['qa', 'ship', 'review']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Verify name: field is patched with gstack- prefix
expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
expect(readSkillName(path.join(installDir, 'ship'))).toBe('gstack-ship');
expect(readSkillName(path.join(installDir, 'review'))).toBe('gstack-review');
});
test('prefix=false restores name: field in SKILL.md', () => {
setupMockInstall(['qa', 'ship']);
// First, prefix them
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
// Now switch to flat mode
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix false`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Verify name: field is restored to unprefixed
expect(readSkillName(path.join(installDir, 'qa'))).toBe('qa');
expect(readSkillName(path.join(installDir, 'ship'))).toBe('ship');
});
test('gstack-upgrade name: not double-prefixed', () => {
setupMockInstall(['qa', 'gstack-upgrade']);
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// gstack-upgrade should keep its name, NOT become gstack-gstack-upgrade
expect(readSkillName(path.join(installDir, 'gstack-upgrade'))).toBe('gstack-upgrade');
// Regular skill should be prefixed
expect(readSkillName(path.join(installDir, 'qa'))).toBe('gstack-qa');
});
test('SKILL.md without frontmatter is a no-op', () => {
setupMockInstall(['qa']);
// Overwrite qa SKILL.md with no frontmatter
fs.writeFileSync(path.join(installDir, 'qa', 'SKILL.md'), '# qa\nSome content.');
run(`${path.join(installDir, 'bin', 'gstack-config')} set skill_prefix true`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Should not crash
run(`${path.join(installDir, 'bin', 'gstack-relink')}`, {
GSTACK_INSTALL_DIR: installDir,
GSTACK_SKILLS_DIR: skillsDir,
});
// Content should be unchanged (no name: to patch)
const content = fs.readFileSync(path.join(installDir, 'qa', 'SKILL.md'), 'utf-8');
expect(content).toBe('# qa\nSome content.');
});
});