mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
b343ba2797
* fix(security): skip hidden directories in skill template discovery
discoverTemplates() scans subdirectories for SKILL.md.tmpl files but
only skips node_modules, .git, and dist. Hidden directories like
.claude/, .agents/, and .codex/ (which contain symlinked skill
installs) were being scanned, allowing a malicious .tmpl in a
symlinked skill to inject into the generation pipeline.
Fix: add !d.name.startsWith('.') to the subdirs() filter. This skips
all dot-prefixed directories, matching the standard convention that
hidden dirs are not source code.
* fix(security): sanitize telemetry JSONL inputs against injection
SKILL, OUTCOME, SESSION_ID, SOURCE, and EVENT_TYPE values go directly
into printf %s for JSONL output. If any contain double quotes,
backslashes, or newlines, the JSON breaks — or worse, injects
arbitrary fields.
Fix: strip quotes, backslashes, and control characters from all
string fields before JSONL construction via json_safe() helper.
* fix(security): validate JSON input in gstack-review-log
gstack-review-log appends its argument directly to a JSONL file with
no validation. Malformed or crafted input could corrupt the review log
or inject arbitrary content.
Fix: validate input is parseable JSON via python3 before appending.
Reject with exit 1 and stderr message if invalid.
* fix: treat relative dot-paths as file paths in screenshot command
Closes #495
* fix: use host-specific co-author trailer in /ship and /document-release
Codex-generated skills hardcoded a Claude co-author trailer in commit
messages. Users running gstack under Codex pushed commits attributed
to the wrong AI assistant.
Add {{CO_AUTHOR_TRAILER}} resolver that emits the correct trailer
based on ctx.host:
- claude: Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- codex: Co-Authored-By: OpenAI Codex <noreply@openai.com>
Replace hardcoded trailers in ship/SKILL.md.tmpl and
document-release/SKILL.md.tmpl with the resolver placeholder.
Fixes #282. Fixes #383.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: auto-upgrade marker no longer masks newer remote versions
When a just-upgraded-from marker persists across sessions, the update
check would write UP_TO_DATE to cache and exit immediately — never
fetching the remote VERSION. Users silently miss updates that landed
after their last upgrade.
Remove the early exit and premature cache write so the script falls
through to the remote check after consuming the marker. This ensures
JUST_UPGRADED is still emitted for the preamble, while also detecting
any newer versions available upstream.
Fixes #515
* fix: decouple doc generation from binary compilation in build script
The build script chains gen:skill-docs and bun build --compile with &&,
so a doc generation failure (e.g. missing Codex host config, template
error) prevents the browse binary from being compiled. Users end up
with a broken install where setup reports the binary is missing.
Replace && with ; for the two gen:skill-docs steps so they run
independently of the compilation chain. Doc generation errors are still
visible in stderr, but no longer block binary compilation.
Fixes #482
* fix: extend security sanitization + add 10 tests for merged community PRs
- Extend json_safe() to ERROR_CLASS and FAILED_STEP fields
- Improve ERROR_MESSAGE escaping to handle backslashes and newlines
- Replace python3 with bun for JSON validation in gstack-review-log
- Add 7 telemetry injection prevention tests
- Add 2 review-log JSON validation tests
- Add 1 discover-skills hidden directory filtering test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize flaky E2E tests (browse-basic, ship-base-branch, dashboard-via)
browse-basic: bump maxTurns 5→7 (agent reads PNG per SKILL.md instruction)
ship-base-branch: extract Step 0 only instead of full 1900-line ship/SKILL.md
dashboard-via: extract dashboard section only + increase timeout 90s→180s
Root cause: copying full SKILL.md files into test fixtures caused context bloat,
leading to timeouts and flaky turn limits. Extracting only the relevant section
cut dashboard-via from timing out at 240s to finishing in 38s.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add E2E fixture extraction rule to CLAUDE.md
Never copy full SKILL.md files into E2E test fixtures. Extract only
the section the test needs. Also: run targeted evals in foreground,
never pkill and restart mid-run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: stabilize journey-think-bigger routing test
Use exact trigger phrases from plan-ceo-review skill description
("think bigger", "expand scope", "ambitious enough") instead of
the ambiguous "thinking too small". Reduce maxTurns 5→3 to cut
cost per attempt ($0.12 vs $0.25). Test remains periodic tier
since LLM routing is inherently non-deterministic.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* remove: delete journey-think-bigger routing test
Never passed reliably. Tests ambiguous routing ("think bigger" →
plan-ceo-review) but Claude legitimately answers directly instead
of invoking a skill. The other 10 journey tests cover routing
with clear, actionable signals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.12.7.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Arun Kumar Thiagarajan <arunkt.bm14@gmail.com>
Co-authored-by: bluzername <bluzer@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Greg Jackson <gregario@users.noreply.github.com>
371 lines
14 KiB
TypeScript
371 lines
14 KiB
TypeScript
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
|
import { execSync } from 'child_process';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '..');
|
|
const BIN = path.join(ROOT, 'bin');
|
|
|
|
// Each test gets a fresh temp directory for GSTACK_STATE_DIR
|
|
let tmpDir: string;
|
|
|
|
function run(cmd: string, env: Record<string, string> = {}): string {
|
|
return execSync(cmd, {
|
|
cwd: ROOT,
|
|
env: { ...process.env, GSTACK_STATE_DIR: tmpDir, GSTACK_DIR: ROOT, ...env },
|
|
encoding: 'utf-8',
|
|
timeout: 10000,
|
|
}).trim();
|
|
}
|
|
|
|
function setConfig(key: string, value: string) {
|
|
run(`${BIN}/gstack-config set ${key} ${value}`);
|
|
}
|
|
|
|
function readJsonl(): string[] {
|
|
const file = path.join(tmpDir, 'analytics', 'skill-usage.jsonl');
|
|
if (!fs.existsSync(file)) return [];
|
|
return fs.readFileSync(file, 'utf-8').trim().split('\n').filter(Boolean);
|
|
}
|
|
|
|
function parseJsonl(): any[] {
|
|
return readJsonl().map(line => JSON.parse(line));
|
|
}
|
|
|
|
beforeEach(() => {
|
|
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-tel-'));
|
|
});
|
|
|
|
afterEach(() => {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
});
|
|
|
|
describe('gstack-telemetry-log', () => {
|
|
test('appends valid JSONL when tier=anonymous', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 142 --outcome success --session-id test-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events).toHaveLength(1);
|
|
expect(events[0].v).toBe(1);
|
|
expect(events[0].skill).toBe('qa');
|
|
expect(events[0].duration_s).toBe(142);
|
|
expect(events[0].outcome).toBe('success');
|
|
expect(events[0].session_id).toBe('test-123');
|
|
expect(events[0].event_type).toBe('skill_run');
|
|
expect(events[0].os).toBeTruthy();
|
|
expect(events[0].gstack_version).toBeTruthy();
|
|
});
|
|
|
|
test('produces no output when tier=off', () => {
|
|
setConfig('telemetry', 'off');
|
|
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome success --session-id test-456`);
|
|
|
|
expect(readJsonl()).toHaveLength(0);
|
|
});
|
|
|
|
test('defaults to off for invalid tier value', () => {
|
|
setConfig('telemetry', 'invalid_value');
|
|
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome success --session-id test-789`);
|
|
|
|
expect(readJsonl()).toHaveLength(0);
|
|
});
|
|
|
|
test('includes installation_id for community tier', () => {
|
|
setConfig('telemetry', 'community');
|
|
run(`${BIN}/gstack-telemetry-log --skill review --duration 100 --outcome success --session-id comm-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events).toHaveLength(1);
|
|
// installation_id should be a UUID v4 (or hex fallback)
|
|
expect(events[0].installation_id).toMatch(/^[a-f0-9-]{32,36}$/);
|
|
});
|
|
|
|
test('installation_id is null for anonymous tier', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id anon-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events[0].installation_id).toBeNull();
|
|
});
|
|
|
|
test('includes error_class when provided', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill browse --duration 10 --outcome error --error-class timeout --session-id err-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events[0].error_class).toBe('timeout');
|
|
expect(events[0].outcome).toBe('error');
|
|
});
|
|
|
|
test('handles missing duration gracefully', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --outcome success --session-id nodur-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events[0].duration_s).toBeNull();
|
|
});
|
|
|
|
test('supports event_type flag', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --event-type upgrade_prompted --skill "" --outcome success --session-id up-123`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events[0].event_type).toBe('upgrade_prompted');
|
|
});
|
|
|
|
test('includes local-only fields (_repo_slug, _branch)', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id local-123`);
|
|
|
|
const events = parseJsonl();
|
|
// These should be present in local JSONL
|
|
expect(events[0]).toHaveProperty('_repo_slug');
|
|
expect(events[0]).toHaveProperty('_branch');
|
|
});
|
|
|
|
// ─── json_safe() injection prevention tests ────────────────
|
|
test('sanitizes skill name with quote injection attempt', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill 'review","injected":"true' --duration 10 --outcome success --session-id inj-1`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
// Must be valid JSON (no injection — quotes stripped, so no field injection possible)
|
|
const event = JSON.parse(lines[0]);
|
|
// The key check: no injected top-level property was created
|
|
expect(event).not.toHaveProperty('injected');
|
|
// Skill field should have quotes stripped but content preserved
|
|
expect(event.skill).not.toContain('"');
|
|
});
|
|
|
|
test('truncates skill name exceeding 200 chars', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
const longSkill = 'a'.repeat(250);
|
|
run(`${BIN}/gstack-telemetry-log --skill '${longSkill}' --duration 10 --outcome success --session-id trunc-1`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events[0].skill.length).toBeLessThanOrEqual(200);
|
|
});
|
|
|
|
test('sanitizes outcome with newline injection attempt', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
// Use printf to pass actual newline in the argument
|
|
run(`bash -c 'OUTCOME=$(printf "success\\nfake\\":\\"true"); ${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome "$OUTCOME" --session-id inj-2'`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
const event = JSON.parse(lines[0]);
|
|
expect(event).not.toHaveProperty('fake');
|
|
});
|
|
|
|
test('sanitizes session_id with backslash-quote injection', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id 'id\\\\"","x":"y'`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
const event = JSON.parse(lines[0]);
|
|
expect(event).not.toHaveProperty('x');
|
|
});
|
|
|
|
test('sanitizes error_class with quote injection', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-class 'timeout","extra":"val' --session-id inj-3`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
const event = JSON.parse(lines[0]);
|
|
expect(event).not.toHaveProperty('extra');
|
|
});
|
|
|
|
test('sanitizes failed_step with quote injection', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --failed-step 'step1","hacked":"yes' --session-id inj-4`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
const event = JSON.parse(lines[0]);
|
|
expect(event).not.toHaveProperty('hacked');
|
|
});
|
|
|
|
test('escapes error_message quotes and preserves content', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'Error: file "test.txt" not found' --session-id inj-5`);
|
|
|
|
const lines = readJsonl();
|
|
expect(lines).toHaveLength(1);
|
|
const event = JSON.parse(lines[0]);
|
|
expect(event.error_message).toContain('file');
|
|
expect(event.error_message).toContain('not found');
|
|
});
|
|
|
|
test('creates analytics directory if missing', () => {
|
|
// Remove analytics dir
|
|
const analyticsDir = path.join(tmpDir, 'analytics');
|
|
if (fs.existsSync(analyticsDir)) fs.rmSync(analyticsDir, { recursive: true });
|
|
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id mkdir-123`);
|
|
|
|
expect(fs.existsSync(analyticsDir)).toBe(true);
|
|
expect(readJsonl()).toHaveLength(1);
|
|
});
|
|
});
|
|
|
|
describe('.pending marker', () => {
|
|
test('finalizes stale .pending from another session as outcome:unknown', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
|
|
// Write a fake .pending marker from a different session
|
|
const analyticsDir = path.join(tmpDir, 'analytics');
|
|
fs.mkdirSync(analyticsDir, { recursive: true });
|
|
fs.writeFileSync(
|
|
path.join(analyticsDir, '.pending-old-123'),
|
|
'{"skill":"old-skill","ts":"2026-03-18T00:00:00Z","session_id":"old-123","gstack_version":"0.6.4"}'
|
|
);
|
|
|
|
// Run telemetry-log with a DIFFERENT session — should finalize the old pending marker
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id new-456`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events).toHaveLength(2);
|
|
|
|
// First event: finalized pending
|
|
expect(events[0].skill).toBe('old-skill');
|
|
expect(events[0].outcome).toBe('unknown');
|
|
expect(events[0].session_id).toBe('old-123');
|
|
|
|
// Second event: new event
|
|
expect(events[1].skill).toBe('qa');
|
|
expect(events[1].outcome).toBe('success');
|
|
});
|
|
|
|
test('.pending-SESSION file is removed after finalization', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
|
|
const analyticsDir = path.join(tmpDir, 'analytics');
|
|
fs.mkdirSync(analyticsDir, { recursive: true });
|
|
const pendingPath = path.join(analyticsDir, '.pending-stale-session');
|
|
fs.writeFileSync(pendingPath, '{"skill":"stale","ts":"2026-03-18T00:00:00Z","session_id":"stale-session","gstack_version":"v"}');
|
|
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id new-456`);
|
|
|
|
expect(fs.existsSync(pendingPath)).toBe(false);
|
|
});
|
|
|
|
test('does not finalize own session pending marker', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
|
|
const analyticsDir = path.join(tmpDir, 'analytics');
|
|
fs.mkdirSync(analyticsDir, { recursive: true });
|
|
// Create pending for same session ID we'll use
|
|
const pendingPath = path.join(analyticsDir, '.pending-same-session');
|
|
fs.writeFileSync(pendingPath, '{"skill":"in-flight","ts":"2026-03-18T00:00:00Z","session_id":"same-session","gstack_version":"v"}');
|
|
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id same-session`);
|
|
|
|
// Should only have 1 event (the new one), not finalize own pending
|
|
const events = parseJsonl();
|
|
expect(events).toHaveLength(1);
|
|
expect(events[0].skill).toBe('qa');
|
|
});
|
|
|
|
test('tier=off still clears own session pending', () => {
|
|
setConfig('telemetry', 'off');
|
|
|
|
const analyticsDir = path.join(tmpDir, 'analytics');
|
|
fs.mkdirSync(analyticsDir, { recursive: true });
|
|
const pendingPath = path.join(analyticsDir, '.pending-off-123');
|
|
fs.writeFileSync(pendingPath, '{"skill":"stale","ts":"2026-03-18T00:00:00Z","session_id":"off-123","gstack_version":"v"}');
|
|
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id off-123`);
|
|
|
|
expect(fs.existsSync(pendingPath)).toBe(false);
|
|
// But no JSONL entries since tier=off
|
|
expect(readJsonl()).toHaveLength(0);
|
|
});
|
|
});
|
|
|
|
describe('gstack-analytics', () => {
|
|
test('shows "no data" for empty JSONL', () => {
|
|
const output = run(`${BIN}/gstack-analytics`);
|
|
expect(output).toContain('no data');
|
|
});
|
|
|
|
test('renders usage dashboard with events', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 120 --outcome success --session-id a-1`);
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id a-2`);
|
|
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome error --error-class timeout --session-id a-3`);
|
|
|
|
const output = run(`${BIN}/gstack-analytics all`);
|
|
expect(output).toContain('/qa');
|
|
expect(output).toContain('/ship');
|
|
expect(output).toContain('2 runs');
|
|
expect(output).toContain('1 runs');
|
|
expect(output).toContain('Success rate: 66%');
|
|
expect(output).toContain('Errors: 1');
|
|
});
|
|
|
|
test('filters by time window', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id t-1`);
|
|
|
|
const output7d = run(`${BIN}/gstack-analytics 7d`);
|
|
expect(output7d).toContain('/qa');
|
|
expect(output7d).toContain('last 7 days');
|
|
});
|
|
});
|
|
|
|
describe('gstack-telemetry-sync', () => {
|
|
test('exits silently with no Supabase URL configured', () => {
|
|
// Default: GSTACK_SUPABASE_URL is not set → exit 0
|
|
const result = run(`${BIN}/gstack-telemetry-sync`);
|
|
expect(result).toBe('');
|
|
});
|
|
|
|
test('exits silently with no JSONL file', () => {
|
|
const result = run(`${BIN}/gstack-telemetry-sync`, { GSTACK_SUPABASE_URL: 'http://localhost:9999' });
|
|
expect(result).toBe('');
|
|
});
|
|
|
|
test('does not rename JSONL field names (edge function expects raw names)', () => {
|
|
setConfig('telemetry', 'anonymous');
|
|
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id raw-fields-1`);
|
|
|
|
const events = parseJsonl();
|
|
expect(events).toHaveLength(1);
|
|
// Edge function expects these raw field names, NOT Postgres column names
|
|
expect(events[0]).toHaveProperty('v');
|
|
expect(events[0]).toHaveProperty('ts');
|
|
expect(events[0]).toHaveProperty('sessions');
|
|
// Should NOT have Postgres column names
|
|
expect(events[0]).not.toHaveProperty('schema_version');
|
|
expect(events[0]).not.toHaveProperty('event_timestamp');
|
|
expect(events[0]).not.toHaveProperty('concurrent_sessions');
|
|
});
|
|
});
|
|
|
|
describe('gstack-community-dashboard', () => {
|
|
test('shows unconfigured message when no Supabase config available', () => {
|
|
// Use a fake GSTACK_DIR with no supabase/config.sh
|
|
const output = run(`${BIN}/gstack-community-dashboard`, {
|
|
GSTACK_DIR: tmpDir,
|
|
GSTACK_SUPABASE_URL: '',
|
|
GSTACK_SUPABASE_ANON_KEY: '',
|
|
});
|
|
expect(output).toContain('Supabase not configured');
|
|
expect(output).toContain('gstack-analytics');
|
|
});
|
|
|
|
test('connects to Supabase when config exists', () => {
|
|
// Use the real GSTACK_DIR which has supabase/config.sh
|
|
const output = run(`${BIN}/gstack-community-dashboard`);
|
|
expect(output).toContain('gstack community dashboard');
|
|
// Should not show "not configured" since config.sh exists
|
|
expect(output).not.toContain('Supabase not configured');
|
|
});
|
|
});
|