Files
gstack/test/worktree.test.ts
T
Garry Tan 8ca950f6f1 feat: content security — 4-layer prompt injection defense for pair-agent (#815)
* feat: token registry for multi-agent browser access

Per-agent scoped tokens with read/write/admin/meta command categories,
domain glob restrictions, rate limiting, expiry, and revocation. Setup
key exchange for the /pair-agent ceremony (5-min one-time key → 24h
session token). Idempotent exchange handles tunnel drops. 39 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate token registry + scoped auth into browse server

Server changes for multi-agent browser access:
- /connect endpoint: setup key exchange for /pair-agent ceremony
- /token endpoint: root-only minting of scoped sub-tokens
- /token/:clientId DELETE: revoke agent tokens
- /agents endpoint: list connected agents (root-only)
- /health: strips root token when tunnel is active (P0 security fix)
- /command: scope/rate/domain checks via token registry before dispatch
- Idle timer skips shutdown when tunnel is active

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ngrok tunnel integration + @ngrok/ngrok dependency

BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve().
Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads
NGROK_DOMAIN for dedicated domain (stable URL). Updates state
file with tunnel URL. Feasibility spike confirmed: SDK works in
compiled Bun binary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tab isolation for multi-agent browser access

Add per-tab ownership tracking to BrowserManager. Scoped agents
must create their own tab via newtab before writing. Unowned tabs
(pre-existing, user-opened) are root-only for writes. Read access
always allowed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: tab enforcement + POST /pair endpoint + activity attribution

Server-side tab ownership check blocks scoped agents from writing to
unowned tabs. Special-case newtab records ownership for scoped tokens.
POST /pair endpoint creates setup keys for the pairing ceremony.
Activity events now include clientId for attribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pair-agent CLI command + instruction block generator

One command to pair a remote agent: $B pair-agent. Creates a setup
key via POST /pair, prints a copy-pasteable instruction block with
curl commands. Smart tunnel fallback (tunnel URL > auto-start >
localhost). Flags: --for HOST, --local HOST, --admin, --client NAME.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: tab isolation + instruction block generator tests

14 tests covering tab ownership lifecycle (access checks, unowned
tabs, transferTab) and instruction block generator (scopes, URLs,
admin flag, troubleshooting section). Fix server-auth test that
used fragile sliceBetween boundaries broken by new endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CSO security fixes — token leak, domain bypass, input validation

1. Remove root token from /health endpoint entirely (CSO #1 CRITICAL).
   Origin header is spoofable. Extension reads from ~/.gstack/.auth.json.
2. Add domain check for newtab URL (CSO #5). Previously only goto was
   checked, allowing domain-restricted agents to bypass via newtab.
3. Validate scope values, rateLimit, expiresSeconds in createToken()
   (CSO #4). Rejects invalid scopes and negative values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /pair-agent skill — syntactic sugar for browser sharing

Users remember /pair-agent, not $B pair-agent. The skill walks through
agent selection (OpenClaw, Hermes, Codex, Cursor, generic), local vs
remote setup, tunnel configuration, and includes platform-specific
notes for each agent type. Wraps the CLI command with context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remote browser access reference for paired agents

Full API reference, snapshot→@ref pattern, scopes, tab isolation,
error codes, ngrok setup, and same-machine shortcuts. The instruction
block points here for deeper reading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: improved instruction block with snapshot→@ref pattern

The paste-into-agent instruction block now teaches the snapshot→@ref
workflow (the most powerful browsing pattern), shows the server URL
prominently, and uses clearer formatting. Tests updated to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: smart ngrok detection + auto-tunnel in pair-agent

The pair-agent command now checks ngrok's native config (not just
~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is
available. The skill template walks users through ngrok install
and auth if not set up, instead of just printing a dead localhost
URL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: on-demand tunnel start via POST /tunnel/start

pair-agent now auto-starts the ngrok tunnel without restarting the
server. New POST /tunnel/start endpoint reads authtoken from env,
~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok
availability and calls the endpoint automatically. Zero manual steps
when ngrok is installed and authed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pair-agent skill must output the instruction block verbatim

Added CRITICAL instruction: the agent MUST output the full instruction
block so the user can copy it. Previously the agent could summarize
over it, leaving the user with nothing to paste.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: scoped tokens rejected on /command — auth gate ordering bug

The blanket validateAuth() gate (root-only) sat above the /command
endpoint, rejecting all scoped tokens with 401 before they reached
getTokenInfo(). Moved /command above the gate so both root and
scoped tokens are accepted. This was the bug Wintermute hit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pair-agent auto-launches headed mode before pairing

When pair-agent detects headless mode, it auto-switches to headed
(visible Chromium window) so the user can watch what the remote
agent does. Use --headless to skip this. Fixed compiled binary
path resolution (process.execPath, not process.argv[1] which is
virtual /$bunfs/ in Bun compiled binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode

16 new tests covering:
- /command sits above blanket auth gate (Wintermute bug)
- /command uses getTokenInfo not validateAuth
- /tunnel/start requires root, checks native ngrok config, returns already_active
- /pair creates setup keys not session tokens
- Tab ownership checked before command dispatch
- Activity events include clientId
- Instruction block teaches snapshot→@ref pattern
- pair-agent auto-headed mode, process.execPath, --headless skip
- isNgrokAvailable checks all 3 sources (gstack env, env var, native config)
- handlePairAgent calls /tunnel/start not server restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: chain scope bypass + /health info leak when tunneled

1. Chain command now pre-validates ALL subcommand scopes before
   executing any. A read+meta token can no longer escalate to
   admin via chain (eval, js, cookies were dispatched without
   scope checks). tokenInfo flows through handleMetaCommand into
   the chain handler. Rejects entire chain if any subcommand fails.

2. /health strips sensitive fields (currentUrl, agent.currentMessage,
   session) when tunnel is active. Only operational metadata (status,
   mode, uptime, tabs) exposed to the internet. Previously anyone
   reaching the ngrok URL could surveil browsing activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: tout /pair-agent as headline feature in CHANGELOG + README

Lead with what it does for the user: type /pair-agent, paste into
your other agent, done. First time AI agents from different companies
can coordinate through a shared browser with real security boundaries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: expand /pair-agent, /design-shotgun, /design-html in README

Each skill gets a real narrative paragraph explaining the workflow,
not just a table cell. design-shotgun: visual exploration with taste
memory. design-html: production HTML with Pretext computed layout.
pair-agent: cross-vendor AI agent coordination through shared browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: split handleCommand into handleCommandInternal + HTTP wrapper

Chain subcommands now route through handleCommandInternal for full security
enforcement (scope, domain, tab ownership, rate limiting, content wrapping).
Adds recursion guard for nested chains, rate-limit exemption for chain
subcommands, and activity event suppression (1 event per chain, not per sub).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add content-security.ts with datamarking, envelope, and filter hooks

Four-layer prompt injection defense for pair-agent browser sharing:
- Datamarking: session-scoped watermark for text exfiltration detection
- Content envelope: trust boundary wrapping with ZWSP marker escaping
- Content filter hooks: extensible filter pipeline with warn/block modes
- Built-in URL blocklist: requestbin, pipedream, webhook.site, etc.

BROWSE_CONTENT_FILTER env var controls mode: off|warn|block (default: warn)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: centralize content wrapping in handleCommandInternal response path

Single wrapping location replaces fragmented per-handler wrapping:
- Scoped tokens: content filters + datamarking + enhanced envelope
- Root tokens: existing basic wrapping (backward compat)
- Chain subcommands exempt from top-level wrapping (wrapped individually)
- Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: hidden element stripping for scoped token text extraction

Detects CSS-hidden elements (opacity, font-size, off-screen, same-color,
clip-path) and ARIA label injection patterns. Marks elements with
data-gstack-hidden, extracts text from a clean clone (no DOM mutation),
then removes markers. Only active for scoped tokens on text command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: snapshot split output format for scoped tokens

Scoped tokens get a split snapshot: trusted @refs section (for click/fill)
separated from untrusted web content in an envelope. Ref names truncated
to 50 chars in trusted section. Root tokens unchanged (backward compat).
Resume command also uses split format for scoped tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add SECURITY section to pair-agent instruction block

Instructs remote agents to treat content inside untrusted envelopes
as potentially malicious. Lists common injection phrases to watch for.
Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS
section, not from page content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 4 prompt injection test fixtures

- injection-visible.html: visible injection in product review text
- injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive
- injection-social.html: social engineering in legitimate-looking content
- injection-combined.html: all attack types + envelope escape attempt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: comprehensive content security tests (47 tests)

Covers all 4 defense layers:
- Datamarking: marker format, session consistency, text-only application
- Content envelope: wrapping, ZWSP marker escaping, filter warnings
- Content filter hooks: URL blocklist, custom filters, warn/block modes
- Instruction block: SECURITY section content, ordering, generation
- Centralized wrapping: source-level verification of integration
- Chain security: recursion guard, rate-limit exemption, activity suppression
- Hidden element stripping: 7 CSS techniques, ARIA injection, false positives
- Snapshot split format: scoped vs root output, resume integration

Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: pair-agent skill compliance + fix all 16 pre-existing test failures

Root cause: pair-agent was added without completing the gen-skill-docs
compliance checklist. All 16 failures traced back to this.

Fixes:
- Sync package.json version to VERSION (0.15.9.0)
- Add "(gstack)" to pair-agent description for discoverability
- Add pair-agent to Codex path exception (legitimately documents ~/.codex/)
- Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist
- Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.)
- Update golden file baselines for ship skill
- Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they
  use the fast mock install instead of scanning real ~/.claude/skills/gstack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.12.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: E2E exit reason precedence + worktree prune race condition

Two fixes for E2E test reliability:

1. session-runner.ts: error_max_turns was misclassified as error_api
   because is_error flag was checked before subtype. Now known subtypes
   like error_max_turns are preserved even when is_error is set. The
   is_error override only applies when subtype=success (API failure).

2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid
   deleting worktrees from concurrent test runs still in progress.
   Previously any second test execution would kill the first's worktrees.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore token in /health for localhost extension auth

The CSO security fix stripped the token from /health to prevent leaking
when tunneled. But the extension needs it to authenticate on localhost.
Now returns token only when not tunneled (safe: localhost-only path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: verify /health token is localhost-only, never served through tunnel

Updated tests to match the restored token behavior:
- Test 1: token assignment exists AND is inside the !tunnelActive guard
- Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add security rationale for token in /health on localhost

Explains why this is an accepted risk (no escalation over file-based
token access), CORS protection, and tunnel guard. Prevents future
CSO scans from stripping it without providing an alternative auth path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: verify tunnel is alive before returning URL to pair-agent

Root cause: when ngrok dies externally (pkill, crash, timeout), the server
still reports tunnelActive=true with a dead URL. pair-agent prints an
instruction block pointing at a dead tunnel. The remote agent gets
"endpoint offline" and the user has to manually restart everything.

Three-layer fix:
- Server /pair endpoint: probes tunnel URL before returning it. If dead,
  resets tunnelActive/tunnelUrl and returns null (triggers CLI restart).
- Server /tunnel/start: probes cached tunnel before returning already_active.
  If dead, falls through to restart ngrok automatically.
- CLI pair-agent: double-checks tunnel URL from server before printing
  instruction block. Falls through to auto-start on failure.

4 regression tests verify all three probe points + CLI verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add POST /batch endpoint for multi-command batching

Remote agents controlling GStack Browser through a tunnel pay 2-5s of
latency per HTTP round-trip. A typical "navigate and read" takes 4
sequential commands = 10-20 seconds. The /batch endpoint collapses N
commands into a single HTTP round-trip, cutting a 20-tab crawl from
~60s to ~5s.

Sequential execution through the full security pipeline (scope, domain,
tab ownership, content wrapping). Rate limiting counts the batch as 1
request. Activity events emitted at batch level, not per-command.
Max 50 commands per batch. Nested batches rejected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add source-level security tests for /batch endpoint

8 tests verifying: auth gate placement, scoped token support, max
command limit, nested batch rejection, rate limiting bypass, batch-level
activity events, command field validation, and tabId passthrough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct CHANGELOG date from 2026-04-06 to 2026-04-05

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate Hermes into generic HTTP option in pair-agent

Hermes doesn't have a host-specific config — it uses the same generic
curl instructions as any other agent. Removing the dedicated option
simplifies the menu and eliminates a misleading distinction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump VERSION to 0.15.14.0, add CHANGELOG entry for batch endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate pair-agent/SKILL.md after main merge

Vendoring deprecation section from main's template wasn't reflected
in the generated file. Fixes check-freshness CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: checkTabAccess uses options object, add own-only tab policy

Refactors checkTabAccess(tabId, clientId, isWrite) to use an options
object { isWrite?, ownOnly? }. Adds tabPolicy === 'own-only' support
in the server command dispatch — scoped tokens with this policy are
restricted to their own tabs for all commands, not just writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --domain flag to pair-agent CLI for domain restrictions

Allows passing --domain to pair-agent to restrict the remote agent's
navigation to specific domains (comma-separated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: remove batch commands CHANGELOG entry and VERSION bump

The batch endpoint work belongs on the browser-batch-multitab branch
(port-louis), not this branch. Reverting VERSION to 0.15.14.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: adopt main's headed-mode /health token serving

Our merge kept the old !tunnelActive guard which conflicted with
main's security-audit-r2 tests that require no currentUrl/currentMessage
in /health. Adopts main's approach: serve token conditionally based on
headed mode or chrome-extension origin. Updates server-auth tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: improve snapshot flags docs completeness for LLM judge

Adds $B placeholder explanation, explicit syntax line, and detailed
flag behavior (-d depth values, -s CSS selector syntax, -D unified
diff format and baseline persistence, -a screenshot vs text output
relationship). Fixes snapshot flags reference LLM eval scoring
completeness < 4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 14:41:06 -07:00

275 lines
9.3 KiB
TypeScript

/**
* Unit tests for WorktreeManager.
*
* Tests worktree lifecycle: create, harvest, dedup, cleanup, prune.
* Each test creates real git worktrees in a temporary repo.
*/
import { describe, test, expect, afterEach } from 'bun:test';
import { WorktreeManager } from '../lib/worktree';
import type { HarvestResult } from '../lib/worktree';
import { spawnSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
/** Create a minimal git repo in a tmpdir for testing. */
function createTestRepo(): string {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'worktree-test-'));
spawnSync('git', ['init'], { cwd: dir, stdio: 'pipe' });
spawnSync('git', ['config', 'user.email', 'test@test.com'], { cwd: dir, stdio: 'pipe' });
spawnSync('git', ['config', 'user.name', 'Test'], { cwd: dir, stdio: 'pipe' });
// Create initial commit so HEAD exists
fs.writeFileSync(path.join(dir, 'README.md'), '# Test repo\n');
// Add .gitignore matching real repo (so copied build artifacts don't appear as changes)
fs.writeFileSync(path.join(dir, '.gitignore'), '.agents/\nbrowse/dist/\n.gstack-worktrees/\n');
// Create a .agents directory (simulating gitignored build artifacts)
fs.mkdirSync(path.join(dir, '.agents', 'skills'), { recursive: true });
fs.writeFileSync(path.join(dir, '.agents', 'skills', 'test-skill.md'), '# Test skill\n');
// Create browse/dist (simulating build artifacts)
fs.mkdirSync(path.join(dir, 'browse', 'dist'), { recursive: true });
fs.writeFileSync(path.join(dir, 'browse', 'dist', 'browse'), '#!/bin/sh\necho browse\n');
spawnSync('git', ['add', 'README.md', '.gitignore'], { cwd: dir, stdio: 'pipe' });
spawnSync('git', ['commit', '-m', 'Initial commit'], { cwd: dir, stdio: 'pipe' });
return dir;
}
/** Clean up a test repo. */
function cleanupRepo(dir: string): void {
// Prune worktrees first to avoid git lock issues
spawnSync('git', ['worktree', 'prune'], { cwd: dir, stdio: 'pipe' });
fs.rmSync(dir, { recursive: true, force: true });
}
// Track repos to clean up
const repos: string[] = [];
// Dedup index path — clear before each test to avoid cross-run contamination
const DEDUP_PATH = path.join(os.homedir(), '.gstack-dev', 'harvests', 'dedup.json');
afterEach(() => {
for (const repo of repos) {
try { cleanupRepo(repo); } catch { /* best effort */ }
}
repos.length = 0;
// Clear dedup index so tests are independent
try { fs.unlinkSync(DEDUP_PATH); } catch { /* may not exist */ }
});
describe('WorktreeManager', () => {
test('create() produces a valid worktree at the expected path', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-1');
expect(fs.existsSync(worktreePath)).toBe(true);
expect(fs.existsSync(path.join(worktreePath, 'README.md'))).toBe(true);
expect(worktreePath).toContain('.gstack-worktrees');
expect(worktreePath).toContain('test-1');
mgr.cleanup('test-1');
});
test('create() worktree has .agents/skills/ (gitignored artifacts copied)', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-agents');
expect(fs.existsSync(path.join(worktreePath, '.agents', 'skills', 'test-skill.md'))).toBe(true);
expect(fs.existsSync(path.join(worktreePath, 'browse', 'dist', 'browse'))).toBe(true);
mgr.cleanup('test-agents');
});
test('create() stores correct originalSha', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const expectedSha = spawnSync('git', ['rev-parse', 'HEAD'], { cwd: repo, stdio: 'pipe' })
.stdout.toString().trim();
mgr.create('test-sha');
const info = mgr.getInfo('test-sha');
expect(info).toBeDefined();
expect(info!.originalSha).toBe(expectedSha);
mgr.cleanup('test-sha');
});
test('harvest() captures modifications to tracked files', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-harvest-mod');
// Modify a tracked file in the worktree
fs.writeFileSync(path.join(worktreePath, 'README.md'), '# Modified!\n');
const result = mgr.harvest('test-harvest-mod');
expect(result).not.toBeNull();
expect(result!.changedFiles).toContain('README.md');
expect(result!.isDuplicate).toBe(false);
expect(result!.patchPath).toBeTruthy();
expect(fs.existsSync(result!.patchPath)).toBe(true);
mgr.cleanup('test-harvest-mod');
});
test('harvest() captures new untracked files (git add -A path)', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-harvest-new');
// Create a new file in the worktree
fs.writeFileSync(path.join(worktreePath, 'new-file.txt'), 'Hello from agent\n');
const result = mgr.harvest('test-harvest-new');
expect(result).not.toBeNull();
expect(result!.changedFiles).toContain('new-file.txt');
mgr.cleanup('test-harvest-new');
});
test('harvest() captures committed changes (git diff originalSha)', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-harvest-commit');
// Make a commit in the worktree (simulating agent running git commit)
fs.writeFileSync(path.join(worktreePath, 'committed.txt'), 'Agent committed this\n');
spawnSync('git', ['add', 'committed.txt'], { cwd: worktreePath, stdio: 'pipe' });
spawnSync('git', ['commit', '-m', 'Agent commit'], { cwd: worktreePath, stdio: 'pipe' });
const result = mgr.harvest('test-harvest-commit');
expect(result).not.toBeNull();
expect(result!.changedFiles).toContain('committed.txt');
mgr.cleanup('test-harvest-commit');
});
test('harvest() returns null when worktree is clean', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
mgr.create('test-harvest-clean');
// Don't modify anything
const result = mgr.harvest('test-harvest-clean');
expect(result).toBeNull();
mgr.cleanup('test-harvest-clean');
});
test('harvest() dedup skips identical patches', () => {
const repo = createTestRepo();
repos.push(repo);
// First run
const mgr1 = new WorktreeManager(repo);
const wt1 = mgr1.create('test-dedup-1');
fs.writeFileSync(path.join(wt1, 'dedup-test.txt'), 'same content\n');
const result1 = mgr1.harvest('test-dedup-1');
mgr1.cleanup('test-dedup-1');
expect(result1).not.toBeNull();
expect(result1!.isDuplicate).toBe(false);
// Second run with same change
const mgr2 = new WorktreeManager(repo);
const wt2 = mgr2.create('test-dedup-2');
fs.writeFileSync(path.join(wt2, 'dedup-test.txt'), 'same content\n');
const result2 = mgr2.harvest('test-dedup-2');
mgr2.cleanup('test-dedup-2');
expect(result2).not.toBeNull();
expect(result2!.isDuplicate).toBe(true);
});
test('cleanup() removes worktree directory', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-cleanup');
expect(fs.existsSync(worktreePath)).toBe(true);
mgr.cleanup('test-cleanup');
expect(fs.existsSync(worktreePath)).toBe(false);
});
test('pruneStale() removes orphaned worktrees from previous runs', () => {
const repo = createTestRepo();
repos.push(repo);
// Create a worktree with a different manager (simulating a previous run)
const oldMgr = new WorktreeManager(repo);
const oldPath = oldMgr.create('stale-test');
const oldRunDir = path.dirname(oldPath);
expect(fs.existsSync(oldPath)).toBe(true);
// Remove via git but leave directory (simulating a crash)
spawnSync('git', ['worktree', 'remove', '--force', oldPath], { cwd: repo, stdio: 'pipe' });
// Recreate the directory to simulate orphaned state
fs.mkdirSync(oldPath, { recursive: true });
// Backdate mtime to simulate a stale worktree (> 1 hour old)
const staleTime = new Date(Date.now() - 7200_000);
fs.utimesSync(oldRunDir, staleTime, staleTime);
// New manager should prune the old run's directory
const newMgr = new WorktreeManager(repo);
newMgr.pruneStale();
expect(fs.existsSync(oldRunDir)).toBe(false);
});
test('create() throws on failure (no silent fallback to ROOT)', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
// Create the same worktree twice — second should fail because path exists
mgr.create('test-fail');
expect(() => mgr.create('test-fail')).toThrow();
mgr.cleanup('test-fail');
});
test('harvest() returns null gracefully when worktree dir was deleted by agent', () => {
const repo = createTestRepo();
repos.push(repo);
const mgr = new WorktreeManager(repo);
const worktreePath = mgr.create('test-deleted');
// Simulate agent deleting its own worktree directory
fs.rmSync(worktreePath, { recursive: true, force: true });
// harvest should return null gracefully, not throw
const result = mgr.harvest('test-deleted');
expect(result).toBeNull();
// cleanup should also be non-fatal
mgr.cleanup('test-deleted');
});
});