mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
04b709d91a
* test: add golden-file baselines for host config refactor Snapshot generated SKILL.md output for ship skill across all 3 existing hosts (Claude, Codex, Factory). These baselines verify the config-driven refactor produces identical output to the current hardcoded system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add HostConfig interface and validator for declarative host system New scripts/host-config.ts defines the typed HostConfig interface that captures all per-host variation: paths, frontmatter rules, path/tool rewrites, suppressed resolvers, runtime root symlinks, install strategy, and behavioral config (co-author trailer, learnings mode, boundary instruction). Includes validateHostConfig() and validateAllConfigs() with regex-based security validation and cross-config uniqueness checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add typed host configs for Claude, Codex, Factory, and Kiro Extract all hardcoded host-specific values from gen-skill-docs.ts, types.ts, preamble.ts, review.ts, and setup into typed HostConfig objects. Each host is a single file in hosts/ with its paths, frontmatter rules, path/tool rewrites, runtime root manifest, and install behavior. hosts/index.ts exports all configs, derives the Host type, and provides resolveHostArg() for CLI alias handling (e.g., 'agents' -> 'codex', 'droid' -> 'factory'). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: derive Host type and HOST_PATHS from host configs types.ts no longer hardcodes host names or paths. The Host type is derived from ALL_HOST_CONFIGS in hosts/index.ts, and HOST_PATHS is built dynamically from each config's globalRoot/localSkillRoot/usesEnvVars. Adding a new host to hosts/index.ts automatically extends the type system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: gen-skill-docs.ts consumes typed host configs Replace hardcoded EXTERNAL_HOST_CONFIG, transformFrontmatter host branches, path/tool rewrite if-chains, and ALL_HOSTS array with config-driven lookups from hosts/*.ts. - Host detection uses resolveHostArg() (handles aliases like agents/droid) - transformFrontmatter uses config's allowlist/denylist mode, extraFields, conditionalFields, renameFields, and descriptionLimitBehavior - Path rewrites use config's pathRewrites array (replaceAll, order matters) - Tool rewrites use config's toolRewrites object - Skill skipping uses config's generation.skipSkills - ALL_HOSTS derived from ALL_HOST_NAMES - Token budget display regex derived from host configs Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: preamble, co-author trailer, and resolver suppression use host configs - preamble.ts: hostConfigDir derived from config.globalRoot instead of hardcoded Record - utility.ts: generateCoAuthorTrailer reads from config.coAuthorTrailer instead of host switch statement - gen-skill-docs.ts: suppressedResolvers from config skip resolver execution at placeholder replacement time (belt+suspenders with existing ctx.host checks in individual resolvers) Golden-file comparison: all 3 hosts produce IDENTICAL output to baselines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: setup tooling uses config-driven host detection - host-config-export.ts: new CLI that exposes host configs to bash (list, get, detect, validate, symlinks commands) - bin/gstack-platform-detect: reads host configs instead of hardcoded binary/path mapping - scripts/skill-check.ts: iterates host configs for skill validation and freshness checks instead of separate Codex/Factory blocks - lib/worktree.ts: iterates host configs for directory copy instead of hardcoded .agents Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add OpenCode, Slate, and Cursor host configs Three new hosts added to the declarative config system. Each is a typed HostConfig object with paths, frontmatter rules, and path rewrites. All generate valid SKILL.md output with zero .claude/skills path leakage. - hosts/opencode.ts: OpenCode (opencode.ai), skills at ~/.config/opencode/ - hosts/slate.ts: Slate (Random Labs), skills at ~/.slate/ - hosts/cursor.ts: Cursor, skills at ~/.cursor/ - .gitignore: add .kiro/, .opencode/, .slate/, .cursor/, .openclaw/ Zero code changes needed — just config files + re-export in index.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add OpenClaw host config with adapter for tool mapping OpenClaw gets a hybrid approach: typed config for paths/frontmatter/ detection + a post-processing adapter for semantic tool rewrites. Config handles: path rewrites, frontmatter (name+description+version), CLAUDE.md→AGENTS.md, tool name rewrites (Bash→exec, Read→read, etc.), suppressed resolvers, SOUL.md via staticFiles. Adapter handles: AskUserQuestion→prose, Agent→sessions_spawn, $B→exec $B. Zero .claude/skills path leakage. Zero hardcoded tool references remaining. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: contributor add-host skill + fix version sync - contrib/add-host/SKILL.md.tmpl: contributor-only skill that guides new host config creation. Lives in contrib/, excluded from user installs. - package.json: sync version with VERSION file (0.15.2.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add parameterized host smoke tests for all hosts 35 new tests covering all 7 external hosts (Codex, Factory, Kiro, OpenCode, Slate, Cursor, OpenClaw). Each host gets 4-5 tests: - output exists on disk with SKILL.md files - no .claude/skills path leakage in non-root skills - frontmatter has name + description fields - --dry-run freshness check passes - /codex skill excluded (for hosts with skipSkills: ['codex']) Tests are parameterized over ALL_HOST_CONFIGS so adding a new host automatically gets smoke-tested with zero new test code. Also updates --host all test to verify all registered hosts generate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: 100% coverage for host config system 71 new tests in test/host-config.test.ts covering: - hosts/index.ts: ALL_HOST_CONFIGS, getHostConfig, resolveHostArg (aliases), getExternalHosts, uniqueness checks - host-config.ts validateHostConfig: name regex, displayName, cliCommand, cliAliases, globalRoot, localSkillRoot, hostSubdir, frontmatter.mode, linkingStrategy, shell injection attempts, paths with $ and ~ - host-config.ts validateAllConfigs: duplicate name/hostSubdir/globalRoot detection, error prefix format, real configs pass - HOST_PATHS derivation: env vars for external hosts, literal paths for Claude, localSkillRoot matches config, every host has entry - host-config-export.ts CLI: list, get (string/boolean/array), detect, validate, symlinks, error cases (missing args, unknown field/host) - Golden-file regression: claude/codex/factory ship SKILL.md vs baselines - Individual host config correctness: prefixable, linkingStrategy, usesEnvVars, description limits, metadata, sidecar, tool rewrites, conditional fields, suppressed resolvers, boundary instruction, co-author trailers, skip rules, path rewrites, runtime root assets Combined with the 35 parameterized smoke tests from gen-skill-docs.test.ts, total new test coverage for multi-host: 106 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update golden baselines and sync version after merge from main Golden files refreshed to match post-merge generated output. package.json version synced to VERSION file (0.15.4.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.15.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sidebar E2E tests now self-contained and passing - sidebar-url-accuracy: fix stale assertion that expected extensionUrl in prompt text (prompt format changed, URL is now in pageUrl field) - sidebar-css-interaction: simplify task from multi-step HN comment navigation to single-page example.com style injection (faster, more reliable, still exercises goto + style + completion flow) - Update golden baselines after merge from main All 3 sidebar tests now pass: 3/3, 0 fail, ~36s total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add ADDING_A_HOST.md guide + update docs for multi-host system - docs/ADDING_A_HOST.md: step-by-step guide for adding a new host (create config, register, gitignore, generate, test). Covers the full HostConfig interface, adapter pattern, and validation. - CONTRIBUTING.md: replace stale "Dual-host development" section with "Multi-host development" covering all 8 hosts and linking to the guide. - README.md: consolidate Codex/Factory install sections into one "Other AI Agents" section listing all supported hosts with auto-detect. - CLAUDE.md: add hosts/, host-config.ts, host-adapters/, contrib/ to project structure tree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: README per-host install instructions for all 8 agents Each supported agent now has its own copy-paste install block with the exact command and where skills end up on disk. Includes: auto-detect, Codex, OpenCode, Cursor, Factory, OpenClaw, Slate, and Kiro. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
472 lines
18 KiB
TypeScript
472 lines
18 KiB
TypeScript
/**
|
|
* Layer 4: E2E tests for the sidebar agent.
|
|
*
|
|
* sidebar-url-accuracy: Deterministic test that verifies the activeTabUrl fix.
|
|
* Starts server (no browser), POSTs to /sidebar-command with different activeTabUrl
|
|
* values, reads the queue file, and verifies the prompt uses the extension URL.
|
|
* No real Claude needed — this is a fast, cheap, deterministic test.
|
|
*
|
|
* sidebar-navigate: Full E2E with real Claude (requires ANTHROPIC_API_KEY).
|
|
* Starts server + sidebar-agent, sends a message, waits for Claude to respond.
|
|
* Tests the complete message flow through the queue.
|
|
*/
|
|
|
|
import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
|
|
import { spawn, type Subprocess } from 'bun';
|
|
import * as fs from 'fs';
|
|
import * as os from 'os';
|
|
import * as path from 'path';
|
|
import {
|
|
ROOT,
|
|
describeIfSelected, testIfSelected,
|
|
createEvalCollector, finalizeEvalCollector,
|
|
} from './helpers/e2e-helpers';
|
|
|
|
const evalCollector = createEvalCollector('e2e-sidebar');
|
|
|
|
// --- Sidebar URL Accuracy (deterministic, no Claude) ---
|
|
|
|
describeIfSelected('Sidebar URL accuracy E2E', ['sidebar-url-accuracy'], () => {
|
|
let serverProc: Subprocess | null = null;
|
|
let serverPort: number = 0;
|
|
let authToken: string = '';
|
|
let tmpDir: string = '';
|
|
let stateFile: string = '';
|
|
let queueFile: string = '';
|
|
|
|
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
|
const headers: Record<string, string> = {
|
|
'Content-Type': 'application/json',
|
|
...(opts.headers as Record<string, string> || {}),
|
|
};
|
|
if (!headers['Authorization'] && authToken) {
|
|
headers['Authorization'] = `Bearer ${authToken}`;
|
|
}
|
|
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
|
}
|
|
|
|
beforeAll(async () => {
|
|
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-e2e-url-'));
|
|
stateFile = path.join(tmpDir, 'browse.json');
|
|
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
|
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
|
|
|
const serverScript = path.resolve(ROOT, 'browse', 'src', 'server.ts');
|
|
serverProc = spawn(['bun', 'run', serverScript], {
|
|
env: {
|
|
...process.env,
|
|
BROWSE_STATE_FILE: stateFile,
|
|
BROWSE_HEADLESS_SKIP: '1',
|
|
BROWSE_PORT: '0',
|
|
SIDEBAR_QUEUE_PATH: queueFile,
|
|
BROWSE_IDLE_TIMEOUT: '300',
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
const deadline = Date.now() + 15000;
|
|
while (Date.now() < deadline) {
|
|
if (fs.existsSync(stateFile)) {
|
|
try {
|
|
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
|
if (state.port && state.token) {
|
|
serverPort = state.port;
|
|
authToken = state.token;
|
|
break;
|
|
}
|
|
} catch {}
|
|
}
|
|
await new Promise(r => setTimeout(r, 100));
|
|
}
|
|
if (!serverPort) throw new Error('Server did not start in time');
|
|
}, 20000);
|
|
|
|
afterAll(() => {
|
|
if (serverProc) { try { serverProc.kill(); } catch {} }
|
|
finalizeEvalCollector(evalCollector);
|
|
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('sidebar-url-accuracy', async () => {
|
|
// Fresh session
|
|
await api('/sidebar-session/new', { method: 'POST' });
|
|
fs.writeFileSync(queueFile, '');
|
|
|
|
const extensionUrl = 'https://example.com/user-navigated-here';
|
|
const resp = await api('/sidebar-command', {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
message: 'What page am I on?',
|
|
activeTabUrl: extensionUrl,
|
|
}),
|
|
});
|
|
expect(resp.status).toBe(200);
|
|
|
|
// Wait for queue entry
|
|
let lastEntry: any = null;
|
|
const deadline = Date.now() + 5000;
|
|
while (Date.now() < deadline) {
|
|
await new Promise(r => setTimeout(r, 100));
|
|
if (!fs.existsSync(queueFile)) continue;
|
|
const lines = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
|
if (lines.length > 0) {
|
|
lastEntry = JSON.parse(lines[lines.length - 1]);
|
|
break;
|
|
}
|
|
}
|
|
|
|
expect(lastEntry).not.toBeNull();
|
|
// Extension URL should be used, not the Playwright fallback.
|
|
// The pageUrl field carries the extension URL; the prompt itself
|
|
// contains only the system prompt + user message (URL is metadata).
|
|
expect(lastEntry.pageUrl).toBe(extensionUrl);
|
|
expect(lastEntry.pageUrl).not.toBe('about:blank');
|
|
|
|
// Also test: chrome:// URL should be rejected, falling back to about:blank
|
|
await api('/sidebar-agent/kill', { method: 'POST' });
|
|
fs.writeFileSync(queueFile, '');
|
|
|
|
await api('/sidebar-command', {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
message: 'test',
|
|
activeTabUrl: 'chrome://settings',
|
|
}),
|
|
});
|
|
await new Promise(r => setTimeout(r, 200));
|
|
const lines2 = fs.readFileSync(queueFile, 'utf-8').trim().split('\n').filter(Boolean);
|
|
if (lines2.length > 0) {
|
|
const entry2 = JSON.parse(lines2[lines2.length - 1]);
|
|
expect(entry2.pageUrl).toBe('about:blank');
|
|
}
|
|
|
|
evalCollector?.addTest({
|
|
name: 'sidebar-url-accuracy', suite: 'Sidebar URL accuracy E2E', tier: 'e2e',
|
|
passed: true,
|
|
duration_ms: 0,
|
|
cost_usd: 0,
|
|
exit_reason: 'success',
|
|
});
|
|
}, 30_000);
|
|
});
|
|
|
|
// --- Sidebar CSS Interaction E2E (real Claude + real browser) ---
|
|
// Goes to HN, reads comments, identifies the most insightful one, highlights it.
|
|
// Exercises: navigation, snapshot, text reading, LLM judgment, CSS style injection.
|
|
|
|
describeIfSelected('Sidebar CSS interaction E2E', ['sidebar-css-interaction'], () => {
|
|
let serverProc: Subprocess | null = null;
|
|
let agentProc: Subprocess | null = null;
|
|
let serverPort: number = 0;
|
|
let authToken: string = '';
|
|
let tmpDir: string = '';
|
|
let stateFile: string = '';
|
|
let queueFile: string = '';
|
|
let serverLogFile: string = '';
|
|
let serverErrFile: string = '';
|
|
let agentLogFile: string = '';
|
|
let agentErrFile: string = '';
|
|
|
|
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
|
const headers: Record<string, string> = {
|
|
'Content-Type': 'application/json',
|
|
...(opts.headers as Record<string, string> || {}),
|
|
};
|
|
if (!headers['Authorization'] && authToken) {
|
|
headers['Authorization'] = `Bearer ${authToken}`;
|
|
}
|
|
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
|
}
|
|
|
|
beforeAll(async () => {
|
|
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-e2e-css-'));
|
|
stateFile = path.join(tmpDir, 'browse.json');
|
|
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
|
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
|
|
|
// Start server WITH a real browser for CSS interaction
|
|
const serverScript = path.resolve(ROOT, 'browse', 'src', 'server.ts');
|
|
serverLogFile = path.join(tmpDir, 'server.log');
|
|
serverErrFile = path.join(tmpDir, 'server.err');
|
|
// Use 'pipe' stdio — closing file descriptors kills the child on macOS/bun
|
|
serverProc = spawn(['bun', 'run', serverScript], {
|
|
env: {
|
|
...process.env,
|
|
BROWSE_STATE_FILE: stateFile,
|
|
BROWSE_PORT: '0',
|
|
SIDEBAR_QUEUE_PATH: queueFile,
|
|
BROWSE_IDLE_TIMEOUT: '600000', // 10 min in ms — test takes ~3 min
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
// Wait for state file with port/token
|
|
const deadline = Date.now() + 30000;
|
|
while (Date.now() < deadline) {
|
|
if (fs.existsSync(stateFile)) {
|
|
try {
|
|
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
|
if (state.port && state.token) {
|
|
serverPort = state.port;
|
|
authToken = state.token;
|
|
break;
|
|
}
|
|
} catch {}
|
|
}
|
|
await new Promise(r => setTimeout(r, 200));
|
|
}
|
|
if (!serverPort) throw new Error('Server did not start in time');
|
|
|
|
// Verify server is healthy before proceeding
|
|
const healthDeadline = Date.now() + 10000;
|
|
let healthy = false;
|
|
while (Date.now() < healthDeadline) {
|
|
try {
|
|
const resp = await fetch(`http://127.0.0.1:${serverPort}/health`);
|
|
if (resp.ok) { healthy = true; break; }
|
|
} catch {}
|
|
await new Promise(r => setTimeout(r, 500));
|
|
}
|
|
if (!healthy) throw new Error('Server started but health check failed');
|
|
|
|
// Start sidebar-agent with the real browse binary
|
|
const agentScript = path.resolve(ROOT, 'browse', 'src', 'sidebar-agent.ts');
|
|
const browseBin = path.resolve(ROOT, 'browse', 'dist', 'browse');
|
|
agentLogFile = path.join(tmpDir, 'agent.log');
|
|
agentErrFile = path.join(tmpDir, 'agent.err');
|
|
// Use 'pipe' stdio — closing file descriptors kills the child on macOS/bun
|
|
agentProc = spawn(['bun', 'run', agentScript], {
|
|
env: {
|
|
...process.env,
|
|
BROWSE_SERVER_PORT: String(serverPort),
|
|
BROWSE_STATE_FILE: stateFile,
|
|
SIDEBAR_QUEUE_PATH: queueFile,
|
|
SIDEBAR_AGENT_TIMEOUT: '180000', // 3 min — multi-step HN comment task
|
|
BROWSE_BIN: fs.existsSync(browseBin) ? browseBin : 'echo',
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
await new Promise(r => setTimeout(r, 2000));
|
|
}, 35000);
|
|
|
|
afterAll(() => {
|
|
if (agentProc) { try { agentProc.kill(); } catch {} }
|
|
if (serverProc) { try { serverProc.kill(); } catch {} }
|
|
finalizeEvalCollector(evalCollector);
|
|
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('sidebar-css-interaction', async () => {
|
|
// Fresh session + clean queue
|
|
try { await api('/sidebar-session/new', { method: 'POST' }); } catch {}
|
|
fs.writeFileSync(queueFile, '');
|
|
const startTime = Date.now();
|
|
|
|
// Simple task: go to example.com, read the title, apply a style
|
|
// (much faster than multi-step HN comment navigation)
|
|
const resp = await api('/sidebar-command', {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
message: 'Go to https://example.com. Read the page title. Add a 4px solid orange outline to the h1 element.',
|
|
activeTabUrl: 'about:blank',
|
|
}),
|
|
});
|
|
expect(resp.status).toBe(200);
|
|
|
|
// Poll for agent_done (4 min timeout — multi-step task with opus LLM)
|
|
const deadline = Date.now() + 240000;
|
|
let entries: any[] = [];
|
|
while (Date.now() < deadline) {
|
|
try {
|
|
const chatResp = await api('/sidebar-chat?after=0');
|
|
const data = await chatResp.json();
|
|
entries = data.entries || [];
|
|
if (entries.some((e: any) => e.type === 'agent_done')) break;
|
|
} catch (err: any) {
|
|
// Server may be temporarily busy or restarting — retry on connection errors
|
|
const isConnErr = err.code === 'ConnectionRefused' || err.message?.includes('ConnectionRefused') || err.message?.includes('Unable to connect');
|
|
if (!isConnErr) throw err;
|
|
}
|
|
await new Promise(r => setTimeout(r, 3000));
|
|
}
|
|
|
|
const duration = Date.now() - startTime;
|
|
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
|
|
|
|
// Dump debug info on failure
|
|
if (!doneEntry || entries.length === 0) {
|
|
console.log('ENTRIES:', JSON.stringify(entries.slice(-5), null, 2));
|
|
console.log('SERVER exitCode:', serverProc?.exitCode, 'signalCode:', serverProc?.signalCode, 'killed:', serverProc?.killed);
|
|
console.log('AGENT exitCode:', agentProc?.exitCode, 'signalCode:', agentProc?.signalCode, 'killed:', agentProc?.killed);
|
|
const queueContent = fs.existsSync(queueFile) ? fs.readFileSync(queueFile, 'utf-8').slice(-500) : 'NO QUEUE';
|
|
console.log('QUEUE:', queueContent.length > 0 ? 'has entries' : 'empty');
|
|
}
|
|
|
|
// Agent should have completed
|
|
expect(doneEntry).toBeDefined();
|
|
|
|
// Agent should have run browse commands (look for tool_use entries)
|
|
const toolUses = entries.filter((e: any) => e.type === 'tool_use');
|
|
expect(toolUses.length).toBeGreaterThanOrEqual(2); // At minimum: goto + one more
|
|
|
|
// Agent text should mention something about the comment it found
|
|
const agentText = entries
|
|
.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'))
|
|
.map((e: any) => e.text || '')
|
|
.join(' ')
|
|
.toLowerCase();
|
|
|
|
// Should have navigated to example.com (look for example.com in any entry text)
|
|
const allEntryText = entries
|
|
.map((e: any) => `${e.text || ''} ${e.input || ''} ${e.message || ''}`)
|
|
.join(' ');
|
|
const navigatedToTarget = allEntryText.includes('example.com') || allEntryText.includes('Example Domain');
|
|
if (!navigatedToTarget) {
|
|
console.log('ALL ENTRY TEXT (first 2000):', allEntryText.slice(0, 2000));
|
|
}
|
|
expect(navigatedToTarget).toBe(true);
|
|
|
|
// Should have applied a style (look for orange/outline in tool commands)
|
|
const allText = entries.map((e: any) => e.text || '').join(' ');
|
|
const appliedStyle = allText.includes('outline') || allText.includes('orange') || allText.includes('style');
|
|
|
|
evalCollector?.addTest({
|
|
name: 'sidebar-css-interaction', suite: 'Sidebar CSS interaction E2E', tier: 'e2e',
|
|
passed: !!doneEntry && navigatedToTarget && appliedStyle,
|
|
duration_ms: duration,
|
|
cost_usd: 0,
|
|
exit_reason: doneEntry ? 'success' : 'timeout',
|
|
});
|
|
}, 300_000);
|
|
});
|
|
|
|
// --- Sidebar Navigate (real Claude, requires ANTHROPIC_API_KEY) ---
|
|
|
|
describeIfSelected('Sidebar navigate E2E', ['sidebar-navigate'], () => {
|
|
let serverProc: Subprocess | null = null;
|
|
let agentProc: Subprocess | null = null;
|
|
let serverPort: number = 0;
|
|
let authToken: string = '';
|
|
let tmpDir: string = '';
|
|
let stateFile: string = '';
|
|
let queueFile: string = '';
|
|
|
|
async function api(pathname: string, opts: RequestInit = {}): Promise<Response> {
|
|
const headers: Record<string, string> = {
|
|
'Content-Type': 'application/json',
|
|
...(opts.headers as Record<string, string> || {}),
|
|
};
|
|
if (!headers['Authorization'] && authToken) {
|
|
headers['Authorization'] = `Bearer ${authToken}`;
|
|
}
|
|
return fetch(`http://127.0.0.1:${serverPort}${pathname}`, { ...opts, headers });
|
|
}
|
|
|
|
beforeAll(async () => {
|
|
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'sidebar-e2e-nav-'));
|
|
stateFile = path.join(tmpDir, 'browse.json');
|
|
queueFile = path.join(tmpDir, 'sidebar-queue.jsonl');
|
|
fs.mkdirSync(path.dirname(queueFile), { recursive: true });
|
|
|
|
// Start server WITHOUT headless skip — we need a real browser for Claude to use
|
|
const serverScript = path.resolve(ROOT, 'browse', 'src', 'server.ts');
|
|
serverProc = spawn(['bun', 'run', serverScript], {
|
|
env: {
|
|
...process.env,
|
|
BROWSE_STATE_FILE: stateFile,
|
|
BROWSE_HEADLESS_SKIP: '1', // Still skip browser — Claude uses curl/fetch instead
|
|
BROWSE_PORT: '0',
|
|
SIDEBAR_QUEUE_PATH: queueFile,
|
|
BROWSE_IDLE_TIMEOUT: '300',
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
const deadline = Date.now() + 15000;
|
|
while (Date.now() < deadline) {
|
|
if (fs.existsSync(stateFile)) {
|
|
try {
|
|
const state = JSON.parse(fs.readFileSync(stateFile, 'utf-8'));
|
|
if (state.port && state.token) {
|
|
serverPort = state.port;
|
|
authToken = state.token;
|
|
break;
|
|
}
|
|
} catch {}
|
|
}
|
|
await new Promise(r => setTimeout(r, 100));
|
|
}
|
|
if (!serverPort) throw new Error('Server did not start in time');
|
|
|
|
// Start sidebar-agent
|
|
const agentScript = path.resolve(ROOT, 'browse', 'src', 'sidebar-agent.ts');
|
|
agentProc = spawn(['bun', 'run', agentScript], {
|
|
env: {
|
|
...process.env,
|
|
BROWSE_SERVER_PORT: String(serverPort),
|
|
BROWSE_STATE_FILE: stateFile,
|
|
SIDEBAR_QUEUE_PATH: queueFile,
|
|
SIDEBAR_AGENT_TIMEOUT: '90000',
|
|
BROWSE_BIN: 'echo', // browse commands won't work, but Claude can use curl
|
|
},
|
|
stdio: ['ignore', 'pipe', 'pipe'],
|
|
});
|
|
|
|
await new Promise(r => setTimeout(r, 1500));
|
|
}, 25000);
|
|
|
|
afterAll(() => {
|
|
if (agentProc) { try { agentProc.kill(); } catch {} }
|
|
if (serverProc) { try { serverProc.kill(); } catch {} }
|
|
finalizeEvalCollector(evalCollector);
|
|
try { fs.rmSync(tmpDir, { recursive: true, force: true }); } catch {}
|
|
});
|
|
|
|
testIfSelected('sidebar-navigate', async () => {
|
|
await api('/sidebar-session/new', { method: 'POST' });
|
|
fs.writeFileSync(queueFile, '');
|
|
const startTime = Date.now();
|
|
|
|
// Ask Claude a simple question — it doesn't need browse commands for this
|
|
const resp = await api('/sidebar-command', {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
message: 'Say exactly "SIDEBAR_TEST_OK" and nothing else.',
|
|
activeTabUrl: 'https://example.com',
|
|
}),
|
|
});
|
|
expect(resp.status).toBe(200);
|
|
|
|
// Poll for agent_done
|
|
const deadline = Date.now() + 90000;
|
|
let entries: any[] = [];
|
|
while (Date.now() < deadline) {
|
|
const chatResp = await api('/sidebar-chat?after=0');
|
|
const data = await chatResp.json();
|
|
entries = data.entries;
|
|
if (entries.some((e: any) => e.type === 'agent_done')) break;
|
|
await new Promise(r => setTimeout(r, 2000));
|
|
}
|
|
|
|
const duration = Date.now() - startTime;
|
|
const doneEntry = entries.find((e: any) => e.type === 'agent_done');
|
|
expect(doneEntry).toBeDefined();
|
|
|
|
// Claude should have responded with something
|
|
const agentText = entries
|
|
.filter((e: any) => e.role === 'agent' && (e.type === 'text' || e.type === 'result'))
|
|
.map((e: any) => e.text || '')
|
|
.join(' ');
|
|
expect(agentText.length).toBeGreaterThan(0);
|
|
|
|
evalCollector?.addTest({
|
|
name: 'sidebar-navigate', suite: 'Sidebar navigate E2E', tier: 'e2e',
|
|
passed: !!doneEntry && agentText.length > 0,
|
|
duration_ms: duration,
|
|
cost_usd: 0,
|
|
exit_reason: doneEntry ? 'success' : 'timeout',
|
|
});
|
|
}, 120_000);
|
|
});
|