Files
gstack/test/telemetry.test.ts
T
Garry Tan 9fd03fae9e v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077)
* fix(gbrain): stop forcing GBRAIN_PREPARE on transaction-mode poolers (#1965)

buildGbrainEnv auto-set GBRAIN_PREPARE=true whenever DATABASE_URL targeted
port 6543, and the /sync-gbrain capability check exported it for the rest
of the skill run. Both had the semantics inverted: gbrain auto-disables
prepared statements on transaction-mode poolers because they break every
write there ("prepared statement does not exist"); GBRAIN_PREPARE=true is
gbrain's documented override for SESSION-mode poolers on 6543, not a
requirement for transaction mode. The #1435 search symptom the auto-set
worked around was fixed gbrain-side.

Remove both force-sets. A caller-set GBRAIN_PREPARE (either value) still
passes through untouched, preserving the session-mode-on-6543 escape hatch.
isTransactionModePooler stays exported.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(gbrain): classify probe timeout as its own status; sync proceeds instead of skipping (#1964)

The 5s engine probe misclassified healthy-but-slow engines (cold Supabase
pooler connections measured at 6.9-10.7s) as broken-config, so /sync-gbrain
silently skipped code+memory and told the user their config was malformed.

- New "timeout" status: probe killed at the deadline with no recognized
  stderr pattern. Default deadline is now 15s, overridable via
  GSTACK_GBRAIN_PROBE_TIMEOUT_MS (tests set 300ms against a fake that
  sleeps 2s).
- Sync stages PROCEED on timeout with a stderr warning naming the env knob;
  a genuinely-dead engine surfaces its real error at the first operation
  instead of a false config diagnosis.
- Consistency everywhere "ok" gated behavior: gstack-gbrain-detect --is-ok
  exits 0 on timeout, and gen-skill-docs' detection gate accepts it, so a
  slow engine no longer silently suppresses brain-aware features.
- Status cache: key now includes the effective probe timeout (raising it
  invalidates a cached timeout) and GBRAIN_HOME; config detection honors
  GBRAIN_HOME so relocated-home users stop being misclassified as
  missing-config.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(bins): cygpath-normalize SCRIPT_DIR for bun imports; surface learnings-log errors (#1950)

Under Windows git-bash, pwd yields a POSIX path (/c/Users/...) that Bun on
Windows cannot resolve as an ES module specifier. gstack-learnings-log
interpolates SCRIPT_DIR into a bun -e import, so every invocation died with
"Cannot find module" — and 2>/dev/null swallowed the error, silently
dropping every AI-logged learning for Windows users.

- 3-line cygpath -m guard in gstack-learnings-log and gstack-question-log
  (which gains the same import shape in the next commit). Matches the
  duplicated IS_WINDOWS convention in setup; no shared shell lib exists.
- learnings-log adopts question-log's set +e / TMPERR capture pattern
  wholesale: validation errors now print to stderr. The old
  `if [ $? -ne 0 ]` check was dead code under set -euo pipefail — the
  script exited at the failing assignment before reaching it.
- New test/bin-windows-bun-import-paths.test.ts: static invariant (any
  bash bin interpolating $SCRIPT_DIR into a bun -e import must carry the
  guard) + behavioral end-to-end run invoked via `bash <bin>` — added to
  the windows-free-tests workflow list so the conversion is proven on the
  only platform where the bug exists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(question-log): dedupe INJECTION_PATTERNS via lib/jsonl-store (#1934)

bin/gstack-question-log carried a local copy of the injection-pattern list,
so pattern fixes to lib/jsonl-store.ts never propagated — including the
/override[:\s]/i false-positive fix arriving via community PR #1940.
Import the shared hasInjection instead (enabled by the previous commit's
cygpath guard). question-log also gets the lib's stricter superset
(human:, disregard, from-now-on, approve-all patterns).

Tests pin the contract in a #1940-order-independent way: an "Override:
ignore all previous instructions" header is rejected, "prose overrides the
deterministic table" is accepted, and a static invariant keeps local
INJECTION_PATTERNS duplicates out of the bin.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(security): community-pulse + both dashboards never report fake zeros (#1947)

The security-signaling surface failed open at three layers — every failure
mode read as a reassuring "0 attacks" / "0 installs":

- community-pulse edge function: supabase-js returns {data,error} without
  throwing, and all five queries discarded `error` — a DB outage produced
  real-looking zeros via the SUCCESS path, and the catch (also returning
  zeros with HTTP 200) was unreachable for query failures. Every query now
  destructures and throws; the catch serves the stale cache (marked
  "stale": true) when one exists, else 503 {"error":"pulse_unavailable"}.
  Success responses carry "status":"ok" so clients can distinguish
  authoritative data from legacy backends. NOTE: the edge function deploys
  out-of-band (supabase functions deploy community-pulse).
- gstack-security-dashboard: captures the HTTP status; non-200 / network
  failure / error body / missing section → "unknown — backend error";
  jq missing → "unknown — install jq" (the lossy grep fallback broke on
  nested arrays and under-reported attacks as zero — removed); a 200
  without the new marker shows figures with an "unverified (legacy
  backend)" note. Also fixes a latent display bug: the TOTAL grep matched
  the digit 7 inside "attacks_last_7_days" and misreported every count.
- gstack-community-dashboard: same class — curl || echo "{}" plus
  grep || echo "0" printed "Weekly active installs: 0" on any failure.
  Now "unknown — backend error (HTTP N)".

test/security-dashboard-fallback.test.ts pins the matrix (200+marker,
200-legacy, 503, network failure) x (jq present, jq absent) for both bins:
"unknown" states never render as 0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(telemetry): redact error_message spans before they leave the machine (#1947)

error_message was uploaded with only quote/newline escaping — stack traces
and failed-API errors can embed credentials, private paths, and hostnames,
and the sync path strips only _repo_slug/_branch.

New lib/redact-engine.ts export redactFindingSpans(): replaces EVERY
finding's span with <REDACTED-{id}> regardless of tier (applyRedactions is
the interactive PII-only path and exits nonzero on credential findings, so
it can't serve machine egress). Returns null when a span can't be located —
callers drop the whole payload rather than risk a leak.

gstack-telemetry-log pipes error_message through it at LOG time, so the
local JSONL at rest is clean too; surrounding text survives for crash
triage. FAIL CLOSED: bun missing, engine error, or non-JSON-string output
all null the field. Tests pin: embedded ghp_ token → <REDACTED-github.pat>
with context intact; redactor unavailable → null; raw bytes on disk never
contain the token.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(redact): prepush guard fails closed on git failure; /ship owns hook install (#1946)

Two gaps closed:

1. Fail closed. The git() helper returned "" on ANY non-zero exit or
   maxBuffer overflow (status null), addedLinesFor produced an empty
   string, and the push sailed through unscanned — fail-open on exactly
   the oversized-diff case where a large secret-bearing blob is most
   likely. The diff call now uses a strict variant that throws; main
   blocks with a clear message naming the GSTACK_REDACT_PREPUSH=skip
   escape valve. Probe calls (symbolic-ref, rev-parse, merge-base) keep
   the permissive helper — their failures are normal control flow.

2. Install path. The hook was installed by nothing ("opt-in, installed by
   nothing" was the issue's words). ./setup runs in the gstack checkout —
   the wrong repo for a per-project hook — so it gets a one-line hint
   only. /ship owns per-repo install: config redact_prepush_hook=true +
   hook missing → silent install (consent already given); config unset +
   no ~/.gstack/.redact-prepush-prompted marker → one-time machine-wide
   AskUserQuestion offer, answer persisted. ship/SKILL.md regenerated in
   this same commit (check-freshness bisect discipline).

Tests: unscannable diff (bogus SHAs) → exit 1 + valve named; empty-but-
successful diff → exit 0; static asserts pin setup as hint-only and the
ship template as the installer surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(redact): six new credential patterns — GitLab, HuggingFace, npm, DigitalOcean, Bearer, GCP SA (#1946)

Coverage gaps from the #1946 security review, including token types for
tooling gstack itself drives (glab):

HIGH (block): gitlab.token (glpat-/glptt-/gldt-), huggingface.token (hf_),
npm.token (npm_), digitalocean.token (dop_v1_), gcp.service_account (the
JSON-escaped "private_key" form that dodges pem.private_key's literal-block
match when minified, confirmed by "private_key_id" proximity).

MEDIUM (warn): auth.bearer — the most FP-prone shape in the set (docs are
full of "Authorization: Bearer <token>"), so it requires header-context
proximity and the same entropy>=3.0 + placeholder validator recipe as
env.kv. "Bearer YOUR_TOKEN_HERE" never fires; calibration over coverage,
per the cries-wolf principle.

All shapes are linear-time; test/redact-pattern-lint.test.ts covers them
automatically. Engine tests add positive + placeholder-negative cases per
pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: coverage-audit additions for the fix wave

Ship Step 7 gap-fill (all passing, 248 tests across the touched suites):
memory + dream stage probe-timeout proceeds, gbrain-detect override paths,
stale-flag passthrough, 200-body-missing-.security fail-closed case,
telemetry redaction edges, and credential-pattern edge cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: pre-landing review fixes

Review army findings (1 critical, auto-fixed with regression tests):

- CRITICAL (security specialist, verified live): redactFindingSpans spliced
  only the regex capture span, and pem.private_key / gcp.service_account
  capture just the BEGIN-header — the key body survived "redaction" and
  shipped via telemetry. Marker-only patterns now drop the whole payload
  (null, fail closed). Overlapping spans (Bearer+JWT on the same bytes) are
  coalesced before splicing so stale offsets can't leave partial secret
  bytes behind.
- gitStrict: drop the dead `|| r.status === null` disjunct (null !== 0
  already covers it); add the signal-kill/null-status regression test the
  docstring promised.
- security-dashboard human mode flags stale snapshots ("figures may be out
  of date") instead of presenting frozen counts as current.
- community-dashboard marker check uses jq when available — the grep-only
  variant misclassified whitespaced/reserialized bodies as legacy.
- telemetry fail-closed test now shadows bun with a failing stub
  (deterministic on any host layout); stale "five status cases" describe
  title renamed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: adversarial review fixes (Claude + Codex cross-model passes)

Both adversarial passes ran against the wave; every FIXABLE finding landed
with a regression test:

- probeTimeoutMs clamps to >=1ms: a fractional override floored to 0, and
  execFileSync treats timeout:0 as NO timeout — the probe that exists to
  bound hangs could hang forever (found by both models independently).
- /ship silent hook install now requires the hooks dir to live inside
  .git: with core.hooksPath (husky's COMMITTED .husky/), the chaining
  installer would have renamed the team's committed pre-push and written a
  machine-local wrapper into the working tree (found by both models).
- gstack-config gbrain-refresh accepts the "timeout" status — the last
  consumer still gating on literal "ok" (Codex); gstack-gbrain-detect's
  config-derived fields honor GBRAIN_HOME so the detection JSON can't
  report status ok alongside config_exists false (Codex).
- prepush: a remote sha absent locally (shallow clone / stale fetch) falls
  back to the merge-base/empty-tree range — scans MORE, never blocks a
  legitimate push into training users toward --no-verify.
- dashboards: curl's own 000 no longer doubles to "HTTP 000000"; the
  community dashboard flags stale snapshots like the security one; array
  sections parse via jq (the sed/grep loops truncated at the first ']');
  the no-jq marker grep tolerates whitespace.
- telemetry: multi-line redactor output nulls the field instead of
  corrupting the JSONL record; setup's hint fires only when the config key
  is genuinely unset (an explicit false is a recorded decline); the /ship
  prompt marker honors GSTACK_HOME.

Kept as designed (cross-model tension noted): Bearer stays MEDIUM in the
prepush gate — a HIGH Bearer would block every docs example; the entropy
validator can't eliminate that FP class, and MEDIUM warns visibly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.11.0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: P1 TODO — eval harness live progress + incremental persistence

Root-caused during this ship: a killed eval run was indistinguishable from a
healthy one for hours (per-file output buffering across mega test files, no
incremental eval-store writes, no honest liveness signal). Full context and
starting points in the entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: fix operational-learning E2E fixture — copy lib/jsonl-store.ts

Pre-existing breakage, proven on main: gstack-learnings-log has imported
lib/jsonl-store.ts (shared injection patterns) since v1.57.5.0 / #1910, but
the fixture copies only the bin scripts — the bin exits 1 before writing
anything, on main silently (stderr swallowed) and on this branch loudly
(the #1950 error-surfacing made the four-day-old failure visible). A real
install always ships bin/ and lib/ together; the fixture now does too.
Verified: the fixture-shaped invocation writes the learning (exit 0) with
lib present, exits 1 on both main and this branch without it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(ios-qa): isolate E2E tests under --concurrent (3 real races)

The ios-qa E2E file failed intermittently under `bun test --concurrent`
(the eval harness default). Three distinct shared-state races, all fixed:

1. Shared pidfile: a module-level `workDir` reassigned in beforeEach was
   clobbered by parallel tests, so concurrent daemons collided on the same
   pidfile and the loser returned `already_running`. Each test now gets its
   own dir via makeWorkDir().
2. process.env path globals: tests set GSTACK_IOS_AUDIT_PATH /
   _ATTEMPTS_PATH / _ALLOWLIST_PATH on the shared process env; concurrent
   tests stomped each other's audit/attempts destinations. Threaded
   auditPath/attemptsPath/allowlistPath through DaemonOptions (and
   mintForCaller) as explicit args — env is no longer load-bearing.
3. afterEach cleanup race: the per-test cleanup drained a shared dir array,
   so the first test to finish deleted still-running tests' workDirs
   mid-assertion. Moved to afterAll (cleans once, after all settle).

Verified: 5/5 clean full-suite runs at --max-concurrency 15 (was
intermittent); daemon unit suite 91/91; daemon source compiles. The paths
default to the env-derived locations when options are omitted, so the
production CLI path is unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(pty): pin spawned claude to EVALS model chain (default claude-sonnet-4-6)

launchClaudePty spawned the interactive `claude` TUI with no --model flag, so
the child inherited the operator's ~/.claude/settings.json model. On a
slow-thinking model that meant 5+ min of extended thinking on empty plan-mode
context, timing out the plan-mode smoke tests regardless of contention. Pin the
model via opts.model ?? EVALS_MODEL ?? 'claude-sonnet-4-6' — byte-identical to
session-runner.ts:144, so PTY and `claude -p` evals always agree.

Pushed before extraArgs (last flag wins, so a per-test --model still overrides).
Placement leaves the spawn region byte-stable for a clean merge with the
in-flight hermetic-env branch. Plumbed model through the three plan-skill
wrappers. Static-grep tripwires guard the pin, its fallback chain, the
before-extraArgs ordering, and all three wrapper forwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect markdown bold-bullet prose AUQs (fixes office-hours smoke)

office-hours auto-mode renders its mode question as `- **Building a startup**`
markdown bullets (office-hours/SKILL.md.tmpl:102) with no letter/number marker.
isProseAUQVisible only matched `A)`-style lettered or `1.`-style numbered
options, so the question went undetected: the model surfaced it at ~2m19s
(well under the 300s budget) but the harness kept scoring the run "working"
off the spinner glyphs and timed out — a false timeout on a question that was
already on screen.

Add Pattern 3: when an interrogative line ('?') is present AND 3+ bold-bullet
markers (`- **`) appear in the 4KB tail, classify as a prose AUQ. Bold is the
discriminator vs incidental prose bullets; the line anchor is dropped (stripAnsi
can collapse option lines) and the existing `❯ 1.` cursor gate still defers to a
live native list. Wires through the existing classifyVisible 'asked' path and the
timeout high-water-mark, so office-hours now classifies 'asked' instead of
'timeout'. Five unit cases: the office-hours render passes; no-'?', <3-bullet,
plain-bullet, and native-cursor cases stay false.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect stripAnsi-collapsed prose AUQs + judge spinner-precedence

The plan-eng/plan-design plan-mode + finding-floor smokes timed out even when
the skill HAD rendered a complete prose AskUserQuestion and was waiting: the PTY
strips cursor-positioning escapes, collapsing the option newlines/spaces so
"A) ..." arrives as "A(recommended)" / "-B:" and "Reply with A, B, or C" as
"ReplywithA,B,orC". Every line-anchored detector (Patterns 1-3) returns false on
those bytes, so proseAUQEverObserved never latched and the run timed out on a
question that was already on screen.

Add Pattern 4/5: a two-signal collapsed-form detector — a reply/recommendation
marker (space-insensitive "reply with [A-D]", "Recommendation:", or
"(recommended)") AND 2+ distinct A-D letters each punctuated by ) : or (. The
conjunction is what separates a real AUQ from incidental report prose; verified
true on the verbatim failing-run buffers where Patterns 1-3 return false.

Also fix the Haiku judge spinner bias: of 614 verdicts, 569 were 'working' and
95 of those noted a question was visible — Claude Code keeps the spinner
animating at an idle prose decision, so the judge coin-flipped. Add a precedence
override: when an option list AND a Recommendation/Reply instruction are both
visible, classify WAITING even with spinner glyphs. Kept the strict dual-signal
gate (never option-list-alone) so auto-decide-preserved doesn't flip.

5 unit tests pin the two-signal contract (2 true on real collapsed bytes, 3
false guards). 90 -> 95 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(plan-review): ask-first scope gate for plan-eng + plan-design review

On an empty/cold invocation, plan-eng-review and plan-design-review would dive
straight into repo exploration (plan-eng) or a 7-pass mockup+audit (plan-design)
and only ask the user much later, if at all. plan-ceo-review already asks first
via an unconditional Step-0 gate and behaves well; these two did not.

Add a hard-STOP scope gate as the FIRST operational instruction in each skill
(above the design-doc check / pre-review audit / mockup defaults it explicitly
overrides): the first tool call must be AskUserQuestion confirming the review
target, before any git/Read/Grep/Glob/Bash or mockup generation. Under
--disallowedTools the options render as plain column-0 lettered prose with a
Recommendation + "Reply with A, B, or C" line so the answer is detectable.

This is correct cold-start UX (confirm what to review before grinding a full
review on nothing) and it is the product half of the plan-mode smoke fix; the
harness collapsed-form detector is the deterministic half that catches the ask
however it renders. Templates + regenerated SKILL.md (default variant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(tiers): reclassify stochastic plan-eng/plan-design ask-first smokes as periodic

plan-eng-review and plan-design-review run a long explore/audit before their
first AskUserQuestion, so whether the plan-mode + finding-floor smokes reach a
terminal outcome within the 300s/600s budget depends on stochastic ask-first
compliance (measured ~50-67%/run even with the hardened gate). Per the
"non-deterministic -> periodic" tiering rule, move the four affected smokes
(plan-eng/plan-design review-plan-mode + finding-floor) to periodic.

The deterministic harness fix (collapsed-form detector + judge precedence) and
the ask-first gate lift these from always-failing to mostly-passing and are the
real product+harness improvements; periodic monitoring tracks the rate weekly
without blocking PRs on an LLM coin-flip. plan-ceo/plan-devex ask-first reliably
and stay gate-tier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): gate the deterministic PTY plan-mode smokes in CI

The real-PTY plan-mode smokes never ran in CI — the gate was local-only. Add an
e2e-pty-plan-smoke matrix suite running the two deterministically-reliable ones
(office-hours-auto-mode, plan-mode-no-op) so a regression there blocks PRs. The
stochastic plan-eng/plan-design ask-first smokes stay periodic (touchfiles
E2E_TIERS) and are not CI-gated.

A fresh CI container has no ~/.claude.json, so the spawned interactive `claude`
would wedge on the onboarding + API-key-approval dialog. Add a scoped seed step
(hasCompletedOnboarding + key approval, its own ANTHROPIC_API_KEY env) before the
run — mirrors what the hermetic E2E child env seeds. Per-suite timeout override
(35 min) via matrix.suite.timeout so the PTY suite has headroom for --retry 2
without bumping the other 12 suites. Report runner count 12 -> 13.

Validate via workflow_dispatch before relying on the gate (PTY-in-CI is new).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): install gstack skill registry for the PTY smoke suite

The first dry-run of e2e-pty-plan-smoke failed: the spawned interactive `claude`
printed "Unknown command: /plan-ceo-review". .claude/skills is gitignored, so a
fresh CI checkout has no gstack skill registry and the TUI can't resolve
/office-hours or /plan-ceo-review.

Add a Register step (scoped to the suite, after Seed, before Run) that mirrors
setup's --no-prefix user-scoped registry minimally: $HOME/.claude/skills/gstack
-> repo (resolves the preambles' absolute ~/.claude/skills/gstack/bin/* and
<skill>/sections/* paths) + per-skill SKILL.md/sections symlinks for the two
skills these tests invoke. HOME is /github/home in this container and the runner
adds no HOME/CLAUDE_CONFIG_DIR override (no hermetic mode), so $HOME is the right
anchor — the Seed step already proved claude reads it. No ./setup (binary build
+ Chromium + fonts + /dev/tty prompt); SKILL.md + bin/ + sections/ are committed.

Self-validating: fails the step loudly on a dangling symlink or missing
`name:` frontmatter, so a moved target surfaces here instead of as a silent
35-min "Unknown command" timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.4.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-21 07:15:19 -07:00

510 lines
20 KiB
TypeScript

import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
import { execSync } from 'child_process';
import * as fs from 'fs';
import * as path from 'path';
import * as os from 'os';
const ROOT = path.resolve(import.meta.dir, '..');
const BIN = path.join(ROOT, 'bin');
// Each test gets a fresh temp directory for GSTACK_STATE_DIR
let tmpDir: string;
function run(cmd: string, env: Record<string, string> = {}): string {
return execSync(cmd, {
cwd: ROOT,
env: { ...process.env, GSTACK_STATE_DIR: tmpDir, GSTACK_DIR: ROOT, ...env },
encoding: 'utf-8',
timeout: 10000,
}).trim();
}
function setConfig(key: string, value: string) {
run(`${BIN}/gstack-config set ${key} ${value}`);
}
function readJsonl(): string[] {
const file = path.join(tmpDir, 'analytics', 'skill-usage.jsonl');
if (!fs.existsSync(file)) return [];
return fs.readFileSync(file, 'utf-8').trim().split('\n').filter(Boolean);
}
function parseJsonl(): any[] {
return readJsonl().map(line => JSON.parse(line));
}
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-tel-'));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
describe('gstack-telemetry-log', () => {
test('appends valid JSONL when tier=anonymous', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 142 --outcome success --session-id test-123`);
const events = parseJsonl();
expect(events).toHaveLength(1);
expect(events[0].v).toBe(1);
expect(events[0].skill).toBe('qa');
expect(events[0].duration_s).toBe(142);
expect(events[0].outcome).toBe('success');
expect(events[0].session_id).toBe('test-123');
expect(events[0].event_type).toBe('skill_run');
expect(events[0].os).toBeTruthy();
expect(events[0].gstack_version).toBeTruthy();
});
test('produces no output when tier=off', () => {
setConfig('telemetry', 'off');
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome success --session-id test-456`);
expect(readJsonl()).toHaveLength(0);
});
test('defaults to off for invalid tier value', () => {
setConfig('telemetry', 'invalid_value');
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome success --session-id test-789`);
expect(readJsonl()).toHaveLength(0);
});
test('includes installation_id for community tier', () => {
setConfig('telemetry', 'community');
run(`${BIN}/gstack-telemetry-log --skill review --duration 100 --outcome success --session-id comm-123`);
const events = parseJsonl();
expect(events).toHaveLength(1);
// installation_id should be a UUID v4 (or hex fallback)
expect(events[0].installation_id).toMatch(/^[a-f0-9-]{32,36}$/);
});
test('installation_id is null for anonymous tier', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id anon-123`);
const events = parseJsonl();
expect(events[0].installation_id).toBeNull();
});
test('includes error_class when provided', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill browse --duration 10 --outcome error --error-class timeout --session-id err-123`);
const events = parseJsonl();
expect(events[0].error_class).toBe('timeout');
expect(events[0].outcome).toBe('error');
});
test('handles missing duration gracefully', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --outcome success --session-id nodur-123`);
const events = parseJsonl();
expect(events[0].duration_s).toBeNull();
});
test('supports event_type flag', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --event-type upgrade_prompted --skill "" --outcome success --session-id up-123`);
const events = parseJsonl();
expect(events[0].event_type).toBe('upgrade_prompted');
});
test('includes local-only fields (_repo_slug, _branch)', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id local-123`);
const events = parseJsonl();
// These should be present in local JSONL
expect(events[0]).toHaveProperty('_repo_slug');
expect(events[0]).toHaveProperty('_branch');
});
// ─── json_safe() injection prevention tests ────────────────
test('sanitizes skill name with quote injection attempt', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill 'review","injected":"true' --duration 10 --outcome success --session-id inj-1`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
// Must be valid JSON (no injection — quotes stripped, so no field injection possible)
const event = JSON.parse(lines[0]);
// The key check: no injected top-level property was created
expect(event).not.toHaveProperty('injected');
// Skill field should have quotes stripped but content preserved
expect(event.skill).not.toContain('"');
});
test('truncates skill name exceeding 200 chars', () => {
setConfig('telemetry', 'anonymous');
const longSkill = 'a'.repeat(250);
run(`${BIN}/gstack-telemetry-log --skill '${longSkill}' --duration 10 --outcome success --session-id trunc-1`);
const events = parseJsonl();
expect(events[0].skill.length).toBeLessThanOrEqual(200);
});
test('sanitizes outcome with newline injection attempt', () => {
setConfig('telemetry', 'anonymous');
// Use printf to pass actual newline in the argument
run(`bash -c 'OUTCOME=$(printf "success\\nfake\\":\\"true"); ${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome "$OUTCOME" --session-id inj-2'`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('fake');
});
test('sanitizes session_id with backslash-quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id 'id\\\\"","x":"y'`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('x');
});
test('sanitizes error_class with quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-class 'timeout","extra":"val' --session-id inj-3`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('extra');
});
test('sanitizes failed_step with quote injection', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --failed-step 'step1","hacked":"yes' --session-id inj-4`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event).not.toHaveProperty('hacked');
});
test('escapes error_message quotes and preserves content', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'Error: file "test.txt" not found' --session-id inj-5`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event.error_message).toContain('file');
expect(event.error_message).toContain('not found');
});
test('redacts credential spans in error_message before they touch disk (#1947)', () => {
setConfig('telemetry', 'anonymous');
const token = 'ghp_' + 'A1b2C3d4E5f6G7h8I9j0K1l2M3n4O5p6Q7r8';
run(
`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'push failed: auth ${token} rejected by remote' --session-id red-1`,
);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
// The span is masked, the surrounding triage context survives.
expect(event.error_message).toContain('<REDACTED-github.pat>');
expect(event.error_message).toContain('push failed');
expect(event.error_message).not.toContain(token);
// Raw bytes on disk never contain the token either.
expect(lines[0]).not.toContain(token);
});
test('fails closed: error_message becomes null when the redactor is unavailable (#1947)', () => {
setConfig('telemetry', 'anonymous');
const token = 'ghp_' + 'A1b2C3d4E5f6G7h8I9j0K1l2M3n4O5p6Q7r8';
// Shadow bun with a failing stub on a prepended PATH (deterministic on
// any host layout — pre-landing review flagged the bare '/usr/bin:/bin'
// variant as environment-dependent): the redaction snippet cannot run,
// so the whole message must drop — never raw passthrough.
const stubBin = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-tel-nobun-'));
try {
fs.writeFileSync(path.join(stubBin, 'bun'), '#!/bin/sh\nexit 127\n');
fs.chmodSync(path.join(stubBin, 'bun'), 0o755);
run(
`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'auth ${token} rejected' --session-id red-2`,
{ PATH: `${stubBin}:${process.env.PATH}` },
);
} finally {
fs.rmSync(stubBin, { recursive: true, force: true });
}
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event.error_message).toBeNull();
expect(lines[0]).not.toContain(token);
});
test('fails closed: PEM key in error_message drops the whole message (#1947 review fix)', () => {
setConfig('telemetry', 'anonymous');
// Header-only pattern: span replacement would forward the key body, so
// the engine returns null and the bin must store null.
run(
`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'deploy failed: -----BEGIN PRIVATE KEY----- MIIEvQIBADANBgkqhkiG9w0BAQEFAASC' --session-id red-5`,
);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event.error_message).toBeNull();
expect(lines[0]).not.toContain('MIIEvQIBADAN');
});
test('truncates error_message to 200 chars after redaction (#1947)', () => {
setConfig('telemetry', 'anonymous');
const long = 'x'.repeat(300);
run(
`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message '${long}' --session-id red-3`,
);
const events = parseJsonl();
expect(events).toHaveLength(1);
expect(events[0].error_message.length).toBeLessThanOrEqual(200);
});
test('fails closed: error_message becomes null when the engine cannot relocate a span (#1947)', () => {
setConfig('telemetry', 'anonymous');
const secret = '8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq';
// env.kv-shaped finding (line-anchored, so the assignment leads the
// message): the span (value) starts past the regex match start,
// locateSpan misses it, redactFindingSpans returns null — the bin must
// drop the whole message, never pass it through raw.
run(
`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome error --error-message 'API_KEY=${secret} rejected by daemon' --session-id red-4`,
);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event.error_message).toBeNull();
expect(lines[0]).not.toContain(secret);
});
test('creates analytics directory if missing', () => {
// Remove analytics dir
const analyticsDir = path.join(tmpDir, 'analytics');
if (fs.existsSync(analyticsDir)) fs.rmSync(analyticsDir, { recursive: true });
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id mkdir-123`);
expect(fs.existsSync(analyticsDir)).toBe(true);
expect(readJsonl()).toHaveLength(1);
});
// ─── Telemetry JSON safety: branch/repo with special chars ────
test('branch name with quotes does not corrupt JSON', () => {
setConfig('telemetry', 'anonymous');
// Simulate a branch name with double quotes by setting it via git env override
// The json_safe function strips quotes, so the JSONL should remain valid
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id branch-quotes-1`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
// Every line must be valid JSON
const event = JSON.parse(lines[0]);
expect(event._branch).toBeDefined();
// _branch should not contain double quotes (json_safe strips them)
expect(event._branch).not.toContain('"');
});
test('repo slug with special chars does not corrupt JSON', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 10 --outcome success --session-id repo-special-1`);
const lines = readJsonl();
expect(lines).toHaveLength(1);
const event = JSON.parse(lines[0]);
expect(event._repo_slug).toBeDefined();
// _repo_slug should not contain double quotes (json_safe strips them)
expect(event._repo_slug).not.toContain('"');
});
});
describe('.pending marker', () => {
test('finalizes stale .pending from another session as outcome:unknown', () => {
setConfig('telemetry', 'anonymous');
// Write a fake .pending marker from a different session
const analyticsDir = path.join(tmpDir, 'analytics');
fs.mkdirSync(analyticsDir, { recursive: true });
fs.writeFileSync(
path.join(analyticsDir, '.pending-old-123'),
'{"skill":"old-skill","ts":"2026-03-18T00:00:00Z","session_id":"old-123","gstack_version":"0.6.4"}'
);
// Run telemetry-log with a DIFFERENT session — should finalize the old pending marker
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id new-456`);
const events = parseJsonl();
expect(events).toHaveLength(2);
// First event: finalized pending
expect(events[0].skill).toBe('old-skill');
expect(events[0].outcome).toBe('unknown');
expect(events[0].session_id).toBe('old-123');
// Second event: new event
expect(events[1].skill).toBe('qa');
expect(events[1].outcome).toBe('success');
});
test('.pending-SESSION file is removed after finalization', () => {
setConfig('telemetry', 'anonymous');
const analyticsDir = path.join(tmpDir, 'analytics');
fs.mkdirSync(analyticsDir, { recursive: true });
const pendingPath = path.join(analyticsDir, '.pending-stale-session');
fs.writeFileSync(pendingPath, '{"skill":"stale","ts":"2026-03-18T00:00:00Z","session_id":"stale-session","gstack_version":"v"}');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id new-456`);
expect(fs.existsSync(pendingPath)).toBe(false);
});
test('does not finalize own session pending marker', () => {
setConfig('telemetry', 'anonymous');
const analyticsDir = path.join(tmpDir, 'analytics');
fs.mkdirSync(analyticsDir, { recursive: true });
// Create pending for same session ID we'll use
const pendingPath = path.join(analyticsDir, '.pending-same-session');
fs.writeFileSync(pendingPath, '{"skill":"in-flight","ts":"2026-03-18T00:00:00Z","session_id":"same-session","gstack_version":"v"}');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id same-session`);
// Should only have 1 event (the new one), not finalize own pending
const events = parseJsonl();
expect(events).toHaveLength(1);
expect(events[0].skill).toBe('qa');
});
test('tier=off still clears own session pending', () => {
setConfig('telemetry', 'off');
const analyticsDir = path.join(tmpDir, 'analytics');
fs.mkdirSync(analyticsDir, { recursive: true });
const pendingPath = path.join(analyticsDir, '.pending-off-123');
fs.writeFileSync(pendingPath, '{"skill":"stale","ts":"2026-03-18T00:00:00Z","session_id":"off-123","gstack_version":"v"}');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 50 --outcome success --session-id off-123`);
expect(fs.existsSync(pendingPath)).toBe(false);
// But no JSONL entries since tier=off
expect(readJsonl()).toHaveLength(0);
});
});
describe('gstack-analytics', () => {
test('shows "no data" for empty JSONL', () => {
const output = run(`${BIN}/gstack-analytics`);
expect(output).toContain('no data');
});
test('renders usage dashboard with events', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 120 --outcome success --session-id a-1`);
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id a-2`);
run(`${BIN}/gstack-telemetry-log --skill ship --duration 30 --outcome error --error-class timeout --session-id a-3`);
const output = run(`${BIN}/gstack-analytics all`);
expect(output).toContain('/qa');
expect(output).toContain('/ship');
expect(output).toContain('2 runs');
expect(output).toContain('1 runs');
expect(output).toContain('Success rate: 66%');
expect(output).toContain('Errors: 1');
});
test('filters by time window', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id t-1`);
const output7d = run(`${BIN}/gstack-analytics 7d`);
expect(output7d).toContain('/qa');
expect(output7d).toContain('last 7 days');
});
});
describe('gstack-telemetry-sync', () => {
test('exits silently with no Supabase URL configured', () => {
// Default: GSTACK_SUPABASE_URL is not set → exit 0
const result = run(`${BIN}/gstack-telemetry-sync`);
expect(result).toBe('');
});
test('exits silently with no JSONL file', () => {
const result = run(`${BIN}/gstack-telemetry-sync`, { GSTACK_SUPABASE_URL: 'http://localhost:9999' });
expect(result).toBe('');
});
test('does not rename JSONL field names (edge function expects raw names)', () => {
setConfig('telemetry', 'anonymous');
run(`${BIN}/gstack-telemetry-log --skill qa --duration 60 --outcome success --session-id raw-fields-1`);
const events = parseJsonl();
expect(events).toHaveLength(1);
// Edge function expects these raw field names, NOT Postgres column names
expect(events[0]).toHaveProperty('v');
expect(events[0]).toHaveProperty('ts');
expect(events[0]).toHaveProperty('sessions');
// Should NOT have Postgres column names
expect(events[0]).not.toHaveProperty('schema_version');
expect(events[0]).not.toHaveProperty('event_timestamp');
expect(events[0]).not.toHaveProperty('concurrent_sessions');
});
});
describe('gstack-community-dashboard', () => {
test('shows unconfigured message when no Supabase config available', () => {
// Use a fake GSTACK_DIR with no supabase/config.sh
const output = run(`${BIN}/gstack-community-dashboard`, {
GSTACK_DIR: tmpDir,
GSTACK_SUPABASE_URL: '',
GSTACK_SUPABASE_ANON_KEY: '',
});
expect(output).toContain('Supabase not configured');
expect(output).toContain('gstack-analytics');
});
test('connects to Supabase when config exists', () => {
// Use the real GSTACK_DIR which has supabase/config.sh
const output = run(`${BIN}/gstack-community-dashboard`);
expect(output).toContain('gstack community dashboard');
// Should not show "not configured" since config.sh exists
expect(output).not.toContain('Supabase not configured');
});
});
describe('preamble telemetry gating (#467)', () => {
test('preamble source does not write JSONL unconditionally', () => {
const preamble = fs.readFileSync(path.join(ROOT, 'scripts', 'resolvers', 'preamble.ts'), 'utf-8');
const lines = preamble.split('\n');
for (let i = 0; i < lines.length; i++) {
if (lines[i].includes('skill-usage.jsonl') && lines[i].includes('>>')) {
// Each JSONL write must be inside a _TEL conditional (within 5 lines above)
let foundConditional = false;
for (let j = i - 1; j >= Math.max(0, i - 5); j--) {
if (lines[j].includes('_TEL') && lines[j].includes('off')) {
foundConditional = true;
break;
}
}
if (!foundConditional) {
throw new Error(`Unconditional JSONL write at preamble.ts line ${i + 1}: ${lines[i].trim()}`);
}
}
}
});
});