mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
b73f364411
* refactor: extract path-security.ts shared module validateOutputPath, validateReadPath, and SAFE_DIRECTORIES were duplicated across write-commands.ts, meta-commands.ts, and read-commands.ts. Extract to a single shared module with re-exports for backward compatibility. Also adds validateTempPath() for the upcoming GET /file endpoint (TEMP_DIR only, not cwd, to prevent remote agents from reading project files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: default paired agents to full access, split SCOPE_CONTROL The trust boundary for paired agents is the pairing ceremony itself, not the scope. An agent with write scope can already click anything and navigate anywhere. Gating js/cookies behind --admin was security theater. Changes: - Default pair scopes: read+write+admin+meta (was read+write) - New SCOPE_CONTROL for browser-wide destructive ops (stop, restart, disconnect, state, handoff, resume, connect) - --admin flag now grants control scope (backward compat) - New --restrict flag for limited access (e.g., --restrict read) - Updated hint text: "re-pair with --control" instead of "--admin" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add media and data commands for page content extraction media command: discovers all img/video/audio/background-image elements on the page. Returns JSON with URLs, dimensions, srcset, loading state, HLS/DASH detection. Supports --images/--videos/--audio filters and optional CSS selector scoping. data command: extracts structured data embedded in pages (JSON-LD, Open Graph, Twitter Cards, meta tags). One command returns product prices, article metadata, social share info without DOM scraping. Both are READ scope with untrusted content wrapping. Shared media-extract.ts helper for reuse by the upcoming scrape command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add download, scrape, and archive commands download: fetch any URL or @ref element to disk using browser session cookies via page.request.fetch(). Supports blob: URLs via in-page base64 conversion. --base64 flag returns inline data URI (cap 10MB). Detects HLS/DASH and rejects with yt-dlp hint. scrape: bulk media download composing media discovery + download loop. Sequential with 100ms delay, URL deduplication, configurable --limit. Writes manifest.json with per-file metadata for machine consumption. archive: saves complete page as MHTML via CDP Page.captureSnapshot. No silent fallback -- errors clearly if CDP unavailable. All three are WRITE scope (write to disk, blocked in watch mode). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add GET /file endpoint for remote agent file retrieval Remote paired agents can now retrieve downloaded files over HTTP. TEMP_DIR only (not cwd) to prevent project file exfiltration. - Bearer token auth (root or scoped with read scope) - Path validation via validateTempPath() (symlink-aware) - 200MB size cap - Extension-based MIME detection - Zero-copy streaming via Bun.file() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scroll --times N for automated repeated scrolling Extends the scroll command with --times N flag for infinite feed scraping. Scrolls N times with configurable --wait delay (default 1000ms) between each scroll for content loading. Usage: scroll --times 10 scroll --times 5 --wait 2000 scroll --times 3 .feed-container Composable with scrape: scroll to load content, then scrape images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add network response body capture (--capture/--export/--bodies) The killer feature for social media scraping. Extends the existing network command to intercept API response bodies: network --capture [--filter graphql] # start capturing network --capture stop # stop network --export /tmp/api.jsonl # export as JSONL network --bodies # show summary Uses page.on('response') listener with URL pattern filtering. SizeCappedBuffer (50MB total, 5MB per-entry cap) evicts oldest entries when full. Binary responses stored as base64, text as-is. This lets agents tap Instagram's GraphQL API, TikTok's hydration data, and any SPA's internal API responses instead of fragile DOM scraping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add screenshot --base64 for inline image return Returns data:image/png;base64,... instead of writing to disk. Cap at 10MB. Works with all screenshot modes (element, clip, viewport). Eliminates the two-step screenshot+file-serve dance for remote agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add data platform tests and media fixture Tests for SizeCappedBuffer (eviction, export, summary), validateTempPath (TEMP_DIR only, rejects cwd), command registration (all new commands in correct scope sets), and MIME mapping source checks. Rich HTML fixture with: standard images, lazy-loaded images, srcset, video with sources + HLS, audio, CSS background-images, JSON-LD, Open Graph, Twitter Cards, and meta tags. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: regenerate SKILL.md with Extraction category Add Extraction category to browse command table ordering. Regenerate SKILL.md files to include media, data, download, scrape, archive commands in the generated documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.16.0.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
177 lines
6.4 KiB
TypeScript
177 lines
6.4 KiB
TypeScript
/**
|
|
* Tests for the browser data platform: media extraction, network capture,
|
|
* path security, and structured data extraction.
|
|
*/
|
|
|
|
import { describe, it, expect } from 'bun:test';
|
|
import { SizeCappedBuffer, type CapturedResponse } from '../src/network-capture';
|
|
import { validateTempPath, validateOutputPath, validateReadPath } from '../src/path-security';
|
|
import { TEMP_DIR } from '../src/platform';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
|
|
// ─── SizeCappedBuffer ─────────────────────────────────────────
|
|
|
|
describe('SizeCappedBuffer', () => {
|
|
function makeEntry(size: number, url = 'https://example.com'): CapturedResponse {
|
|
return {
|
|
url,
|
|
status: 200,
|
|
headers: {},
|
|
body: 'x'.repeat(size),
|
|
contentType: 'text/plain',
|
|
timestamp: Date.now(),
|
|
size,
|
|
bodyTruncated: false,
|
|
};
|
|
}
|
|
|
|
it('stores entries within capacity', () => {
|
|
const buf = new SizeCappedBuffer(1000);
|
|
buf.push(makeEntry(100));
|
|
buf.push(makeEntry(200));
|
|
expect(buf.length).toBe(2);
|
|
expect(buf.byteSize).toBe(300);
|
|
});
|
|
|
|
it('evicts oldest entries when over capacity', () => {
|
|
const buf = new SizeCappedBuffer(500);
|
|
buf.push(makeEntry(200, 'https://a.com'));
|
|
buf.push(makeEntry(200, 'https://b.com'));
|
|
buf.push(makeEntry(200, 'https://c.com')); // should evict first entry
|
|
expect(buf.length).toBe(2);
|
|
const urls = buf.toArray().map(e => e.url);
|
|
expect(urls).toContain('https://b.com');
|
|
expect(urls).toContain('https://c.com');
|
|
expect(urls).not.toContain('https://a.com');
|
|
});
|
|
|
|
it('evicts multiple entries for one large push', () => {
|
|
const buf = new SizeCappedBuffer(500);
|
|
buf.push(makeEntry(100));
|
|
buf.push(makeEntry(100));
|
|
buf.push(makeEntry(100));
|
|
buf.push(makeEntry(400)); // evicts first two (need totalSize + 400 <= 500, so totalSize <= 100)
|
|
expect(buf.length).toBe(2); // one 100-byte entry + one 400-byte entry
|
|
expect(buf.byteSize).toBe(500);
|
|
});
|
|
|
|
it('clear resets buffer', () => {
|
|
const buf = new SizeCappedBuffer(1000);
|
|
buf.push(makeEntry(100));
|
|
buf.push(makeEntry(200));
|
|
buf.clear();
|
|
expect(buf.length).toBe(0);
|
|
expect(buf.byteSize).toBe(0);
|
|
});
|
|
|
|
it('exports to JSONL file', () => {
|
|
const buf = new SizeCappedBuffer(1000);
|
|
buf.push(makeEntry(10, 'https://a.com'));
|
|
buf.push(makeEntry(20, 'https://b.com'));
|
|
|
|
const tmpFile = path.join(os.tmpdir(), `test-export-${Date.now()}.jsonl`);
|
|
try {
|
|
const count = buf.exportToFile(tmpFile);
|
|
expect(count).toBe(2);
|
|
const lines = fs.readFileSync(tmpFile, 'utf-8').trim().split('\n');
|
|
expect(lines.length).toBe(2);
|
|
const parsed = JSON.parse(lines[0]);
|
|
expect(parsed.url).toBe('https://a.com');
|
|
} finally {
|
|
fs.unlinkSync(tmpFile);
|
|
}
|
|
});
|
|
|
|
it('summary shows entries', () => {
|
|
const buf = new SizeCappedBuffer(1000);
|
|
buf.push(makeEntry(1024, 'https://api.example.com/graphql'));
|
|
const summary = buf.summary();
|
|
expect(summary).toContain('1 responses');
|
|
expect(summary).toContain('graphql');
|
|
expect(summary).toContain('1KB');
|
|
});
|
|
|
|
it('summary shows empty message when no entries', () => {
|
|
const buf = new SizeCappedBuffer(1000);
|
|
expect(buf.summary()).toBe('No captured responses.');
|
|
});
|
|
});
|
|
|
|
// ─── validateTempPath ─────────────────────────────────────────
|
|
|
|
describe('validateTempPath', () => {
|
|
let tmpFile: string;
|
|
|
|
it('allows paths within /tmp that exist', () => {
|
|
tmpFile = path.join(TEMP_DIR, `test-temp-path-${Date.now()}.jpg`);
|
|
fs.writeFileSync(tmpFile, 'test');
|
|
try {
|
|
expect(() => validateTempPath(tmpFile)).not.toThrow();
|
|
} finally {
|
|
fs.unlinkSync(tmpFile);
|
|
}
|
|
});
|
|
|
|
it('rejects non-existent files', () => {
|
|
expect(() => validateTempPath('/tmp/nonexistent-file-12345.jpg')).toThrow(/not found/i);
|
|
});
|
|
|
|
it('rejects paths in cwd', () => {
|
|
// Create a real file in cwd to test the path check (not the existence check)
|
|
const cwdFile = path.join(process.cwd(), 'package.json');
|
|
expect(() => validateTempPath(cwdFile)).toThrow(/temp directory/i);
|
|
});
|
|
|
|
it('rejects absolute paths outside safe dirs', () => {
|
|
expect(() => validateTempPath('/etc/passwd')).toThrow();
|
|
});
|
|
});
|
|
|
|
// ─── Command registration ─────────────────────────────────────
|
|
|
|
describe('command registration', () => {
|
|
it('all new commands have descriptions', () => {
|
|
// The load-time validation in commands.ts throws if any command
|
|
// is missing from COMMAND_DESCRIPTIONS. If this import succeeds,
|
|
// all commands are properly registered.
|
|
const { COMMAND_DESCRIPTIONS, ALL_COMMANDS } = require('../src/commands');
|
|
const newCommands = ['media', 'data', 'download', 'scrape', 'archive'];
|
|
for (const cmd of newCommands) {
|
|
expect(ALL_COMMANDS.has(cmd)).toBe(true);
|
|
expect(COMMAND_DESCRIPTIONS[cmd]).toBeTruthy();
|
|
}
|
|
});
|
|
|
|
it('new commands are in correct scope sets', () => {
|
|
const { SCOPE_READ, SCOPE_WRITE } = require('../src/token-registry');
|
|
expect(SCOPE_READ.has('media')).toBe(true);
|
|
expect(SCOPE_READ.has('data')).toBe(true);
|
|
expect(SCOPE_WRITE.has('download')).toBe(true);
|
|
expect(SCOPE_WRITE.has('scrape')).toBe(true);
|
|
expect(SCOPE_WRITE.has('archive')).toBe(true);
|
|
});
|
|
|
|
it('media and data are in PAGE_CONTENT_COMMANDS', () => {
|
|
const { PAGE_CONTENT_COMMANDS } = require('../src/commands');
|
|
expect(PAGE_CONTENT_COMMANDS.has('media')).toBe(true);
|
|
expect(PAGE_CONTENT_COMMANDS.has('data')).toBe(true);
|
|
});
|
|
});
|
|
|
|
// ─── MIME type mapping ─────────────────────────────────────────
|
|
|
|
describe('mimeToExt', () => {
|
|
// mimeToExt is a private function in write-commands.ts,
|
|
// so we test it indirectly through command behavior.
|
|
// This test verifies the source contains the expected mappings.
|
|
it('write-commands.ts contains MIME mappings', () => {
|
|
const src = fs.readFileSync(path.join(import.meta.dir, '../src/write-commands.ts'), 'utf-8');
|
|
expect(src).toContain("'image/png': '.png'");
|
|
expect(src).toContain("'image/jpeg': '.jpg'");
|
|
expect(src).toContain("'video/mp4': '.mp4'");
|
|
expect(src).toContain("'audio/mpeg': '.mp3'");
|
|
});
|
|
});
|