mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-22 17:49:57 +02:00
9fd03fae9e
* fix(gbrain): stop forcing GBRAIN_PREPARE on transaction-mode poolers (#1965) buildGbrainEnv auto-set GBRAIN_PREPARE=true whenever DATABASE_URL targeted port 6543, and the /sync-gbrain capability check exported it for the rest of the skill run. Both had the semantics inverted: gbrain auto-disables prepared statements on transaction-mode poolers because they break every write there ("prepared statement does not exist"); GBRAIN_PREPARE=true is gbrain's documented override for SESSION-mode poolers on 6543, not a requirement for transaction mode. The #1435 search symptom the auto-set worked around was fixed gbrain-side. Remove both force-sets. A caller-set GBRAIN_PREPARE (either value) still passes through untouched, preserving the session-mode-on-6543 escape hatch. isTransactionModePooler stays exported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(gbrain): classify probe timeout as its own status; sync proceeds instead of skipping (#1964) The 5s engine probe misclassified healthy-but-slow engines (cold Supabase pooler connections measured at 6.9-10.7s) as broken-config, so /sync-gbrain silently skipped code+memory and told the user their config was malformed. - New "timeout" status: probe killed at the deadline with no recognized stderr pattern. Default deadline is now 15s, overridable via GSTACK_GBRAIN_PROBE_TIMEOUT_MS (tests set 300ms against a fake that sleeps 2s). - Sync stages PROCEED on timeout with a stderr warning naming the env knob; a genuinely-dead engine surfaces its real error at the first operation instead of a false config diagnosis. - Consistency everywhere "ok" gated behavior: gstack-gbrain-detect --is-ok exits 0 on timeout, and gen-skill-docs' detection gate accepts it, so a slow engine no longer silently suppresses brain-aware features. - Status cache: key now includes the effective probe timeout (raising it invalidates a cached timeout) and GBRAIN_HOME; config detection honors GBRAIN_HOME so relocated-home users stop being misclassified as missing-config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(bins): cygpath-normalize SCRIPT_DIR for bun imports; surface learnings-log errors (#1950) Under Windows git-bash, pwd yields a POSIX path (/c/Users/...) that Bun on Windows cannot resolve as an ES module specifier. gstack-learnings-log interpolates SCRIPT_DIR into a bun -e import, so every invocation died with "Cannot find module" — and 2>/dev/null swallowed the error, silently dropping every AI-logged learning for Windows users. - 3-line cygpath -m guard in gstack-learnings-log and gstack-question-log (which gains the same import shape in the next commit). Matches the duplicated IS_WINDOWS convention in setup; no shared shell lib exists. - learnings-log adopts question-log's set +e / TMPERR capture pattern wholesale: validation errors now print to stderr. The old `if [ $? -ne 0 ]` check was dead code under set -euo pipefail — the script exited at the failing assignment before reaching it. - New test/bin-windows-bun-import-paths.test.ts: static invariant (any bash bin interpolating $SCRIPT_DIR into a bun -e import must carry the guard) + behavioral end-to-end run invoked via `bash <bin>` — added to the windows-free-tests workflow list so the conversion is proven on the only platform where the bug exists. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(question-log): dedupe INJECTION_PATTERNS via lib/jsonl-store (#1934) bin/gstack-question-log carried a local copy of the injection-pattern list, so pattern fixes to lib/jsonl-store.ts never propagated — including the /override[:\s]/i false-positive fix arriving via community PR #1940. Import the shared hasInjection instead (enabled by the previous commit's cygpath guard). question-log also gets the lib's stricter superset (human:, disregard, from-now-on, approve-all patterns). Tests pin the contract in a #1940-order-independent way: an "Override: ignore all previous instructions" header is rejected, "prose overrides the deterministic table" is accepted, and a static invariant keeps local INJECTION_PATTERNS duplicates out of the bin. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(security): community-pulse + both dashboards never report fake zeros (#1947) The security-signaling surface failed open at three layers — every failure mode read as a reassuring "0 attacks" / "0 installs": - community-pulse edge function: supabase-js returns {data,error} without throwing, and all five queries discarded `error` — a DB outage produced real-looking zeros via the SUCCESS path, and the catch (also returning zeros with HTTP 200) was unreachable for query failures. Every query now destructures and throws; the catch serves the stale cache (marked "stale": true) when one exists, else 503 {"error":"pulse_unavailable"}. Success responses carry "status":"ok" so clients can distinguish authoritative data from legacy backends. NOTE: the edge function deploys out-of-band (supabase functions deploy community-pulse). - gstack-security-dashboard: captures the HTTP status; non-200 / network failure / error body / missing section → "unknown — backend error"; jq missing → "unknown — install jq" (the lossy grep fallback broke on nested arrays and under-reported attacks as zero — removed); a 200 without the new marker shows figures with an "unverified (legacy backend)" note. Also fixes a latent display bug: the TOTAL grep matched the digit 7 inside "attacks_last_7_days" and misreported every count. - gstack-community-dashboard: same class — curl || echo "{}" plus grep || echo "0" printed "Weekly active installs: 0" on any failure. Now "unknown — backend error (HTTP N)". test/security-dashboard-fallback.test.ts pins the matrix (200+marker, 200-legacy, 503, network failure) x (jq present, jq absent) for both bins: "unknown" states never render as 0. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(telemetry): redact error_message spans before they leave the machine (#1947) error_message was uploaded with only quote/newline escaping — stack traces and failed-API errors can embed credentials, private paths, and hostnames, and the sync path strips only _repo_slug/_branch. New lib/redact-engine.ts export redactFindingSpans(): replaces EVERY finding's span with <REDACTED-{id}> regardless of tier (applyRedactions is the interactive PII-only path and exits nonzero on credential findings, so it can't serve machine egress). Returns null when a span can't be located — callers drop the whole payload rather than risk a leak. gstack-telemetry-log pipes error_message through it at LOG time, so the local JSONL at rest is clean too; surrounding text survives for crash triage. FAIL CLOSED: bun missing, engine error, or non-JSON-string output all null the field. Tests pin: embedded ghp_ token → <REDACTED-github.pat> with context intact; redactor unavailable → null; raw bytes on disk never contain the token. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(redact): prepush guard fails closed on git failure; /ship owns hook install (#1946) Two gaps closed: 1. Fail closed. The git() helper returned "" on ANY non-zero exit or maxBuffer overflow (status null), addedLinesFor produced an empty string, and the push sailed through unscanned — fail-open on exactly the oversized-diff case where a large secret-bearing blob is most likely. The diff call now uses a strict variant that throws; main blocks with a clear message naming the GSTACK_REDACT_PREPUSH=skip escape valve. Probe calls (symbolic-ref, rev-parse, merge-base) keep the permissive helper — their failures are normal control flow. 2. Install path. The hook was installed by nothing ("opt-in, installed by nothing" was the issue's words). ./setup runs in the gstack checkout — the wrong repo for a per-project hook — so it gets a one-line hint only. /ship owns per-repo install: config redact_prepush_hook=true + hook missing → silent install (consent already given); config unset + no ~/.gstack/.redact-prepush-prompted marker → one-time machine-wide AskUserQuestion offer, answer persisted. ship/SKILL.md regenerated in this same commit (check-freshness bisect discipline). Tests: unscannable diff (bogus SHAs) → exit 1 + valve named; empty-but- successful diff → exit 0; static asserts pin setup as hint-only and the ship template as the installer surface. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(redact): six new credential patterns — GitLab, HuggingFace, npm, DigitalOcean, Bearer, GCP SA (#1946) Coverage gaps from the #1946 security review, including token types for tooling gstack itself drives (glab): HIGH (block): gitlab.token (glpat-/glptt-/gldt-), huggingface.token (hf_), npm.token (npm_), digitalocean.token (dop_v1_), gcp.service_account (the JSON-escaped "private_key" form that dodges pem.private_key's literal-block match when minified, confirmed by "private_key_id" proximity). MEDIUM (warn): auth.bearer — the most FP-prone shape in the set (docs are full of "Authorization: Bearer <token>"), so it requires header-context proximity and the same entropy>=3.0 + placeholder validator recipe as env.kv. "Bearer YOUR_TOKEN_HERE" never fires; calibration over coverage, per the cries-wolf principle. All shapes are linear-time; test/redact-pattern-lint.test.ts covers them automatically. Engine tests add positive + placeholder-negative cases per pattern. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: coverage-audit additions for the fix wave Ship Step 7 gap-fill (all passing, 248 tests across the touched suites): memory + dream stage probe-timeout proceeds, gbrain-detect override paths, stale-flag passthrough, 200-body-missing-.security fail-closed case, telemetry redaction edges, and credential-pattern edge cases. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: pre-landing review fixes Review army findings (1 critical, auto-fixed with regression tests): - CRITICAL (security specialist, verified live): redactFindingSpans spliced only the regex capture span, and pem.private_key / gcp.service_account capture just the BEGIN-header — the key body survived "redaction" and shipped via telemetry. Marker-only patterns now drop the whole payload (null, fail closed). Overlapping spans (Bearer+JWT on the same bytes) are coalesced before splicing so stale offsets can't leave partial secret bytes behind. - gitStrict: drop the dead `|| r.status === null` disjunct (null !== 0 already covers it); add the signal-kill/null-status regression test the docstring promised. - security-dashboard human mode flags stale snapshots ("figures may be out of date") instead of presenting frozen counts as current. - community-dashboard marker check uses jq when available — the grep-only variant misclassified whitespaced/reserialized bodies as legacy. - telemetry fail-closed test now shadows bun with a failing stub (deterministic on any host layout); stale "five status cases" describe title renamed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: adversarial review fixes (Claude + Codex cross-model passes) Both adversarial passes ran against the wave; every FIXABLE finding landed with a regression test: - probeTimeoutMs clamps to >=1ms: a fractional override floored to 0, and execFileSync treats timeout:0 as NO timeout — the probe that exists to bound hangs could hang forever (found by both models independently). - /ship silent hook install now requires the hooks dir to live inside .git: with core.hooksPath (husky's COMMITTED .husky/), the chaining installer would have renamed the team's committed pre-push and written a machine-local wrapper into the working tree (found by both models). - gstack-config gbrain-refresh accepts the "timeout" status — the last consumer still gating on literal "ok" (Codex); gstack-gbrain-detect's config-derived fields honor GBRAIN_HOME so the detection JSON can't report status ok alongside config_exists false (Codex). - prepush: a remote sha absent locally (shallow clone / stale fetch) falls back to the merge-base/empty-tree range — scans MORE, never blocks a legitimate push into training users toward --no-verify. - dashboards: curl's own 000 no longer doubles to "HTTP 000000"; the community dashboard flags stale snapshots like the security one; array sections parse via jq (the sed/grep loops truncated at the first ']'); the no-jq marker grep tolerates whitespace. - telemetry: multi-line redactor output nulls the field instead of corrupting the JSONL record; setup's hint fires only when the config key is genuinely unset (an explicit false is a recorded decline); the /ship prompt marker honors GSTACK_HOME. Kept as designed (cross-model tension noted): Bearer stays MEDIUM in the prepush gate — a HIGH Bearer would block every docs example; the entropy validator can't eliminate that FP class, and MEDIUM warns visibly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore: bump version and changelog (v1.57.11.0) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs: P1 TODO — eval harness live progress + incremental persistence Root-caused during this ship: a killed eval run was indistinguishable from a healthy one for hours (per-file output buffering across mega test files, no incremental eval-store writes, no honest liveness signal). Full context and starting points in the entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: fix operational-learning E2E fixture — copy lib/jsonl-store.ts Pre-existing breakage, proven on main: gstack-learnings-log has imported lib/jsonl-store.ts (shared injection patterns) since v1.57.5.0 / #1910, but the fixture copies only the bin scripts — the bin exits 1 before writing anything, on main silently (stderr swallowed) and on this branch loudly (the #1950 error-surfacing made the four-day-old failure visible). A real install always ships bin/ and lib/ together; the fixture now does too. Verified: the fixture-shaped invocation writes the learning (exit 0) with lib present, exits 1 on both main and this branch without it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(ios-qa): isolate E2E tests under --concurrent (3 real races) The ios-qa E2E file failed intermittently under `bun test --concurrent` (the eval harness default). Three distinct shared-state races, all fixed: 1. Shared pidfile: a module-level `workDir` reassigned in beforeEach was clobbered by parallel tests, so concurrent daemons collided on the same pidfile and the loser returned `already_running`. Each test now gets its own dir via makeWorkDir(). 2. process.env path globals: tests set GSTACK_IOS_AUDIT_PATH / _ATTEMPTS_PATH / _ALLOWLIST_PATH on the shared process env; concurrent tests stomped each other's audit/attempts destinations. Threaded auditPath/attemptsPath/allowlistPath through DaemonOptions (and mintForCaller) as explicit args — env is no longer load-bearing. 3. afterEach cleanup race: the per-test cleanup drained a shared dir array, so the first test to finish deleted still-running tests' workDirs mid-assertion. Moved to afterAll (cleans once, after all settle). Verified: 5/5 clean full-suite runs at --max-concurrency 15 (was intermittent); daemon unit suite 91/91; daemon source compiles. The paths default to the env-derived locations when options are omitted, so the production CLI path is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test(pty): pin spawned claude to EVALS model chain (default claude-sonnet-4-6) launchClaudePty spawned the interactive `claude` TUI with no --model flag, so the child inherited the operator's ~/.claude/settings.json model. On a slow-thinking model that meant 5+ min of extended thinking on empty plan-mode context, timing out the plan-mode smoke tests regardless of contention. Pin the model via opts.model ?? EVALS_MODEL ?? 'claude-sonnet-4-6' — byte-identical to session-runner.ts:144, so PTY and `claude -p` evals always agree. Pushed before extraArgs (last flag wins, so a per-test --model still overrides). Placement leaves the spawn region byte-stable for a clean merge with the in-flight hermetic-env branch. Plumbed model through the three plan-skill wrappers. Static-grep tripwires guard the pin, its fallback chain, the before-extraArgs ordering, and all three wrapper forwards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(pty): detect markdown bold-bullet prose AUQs (fixes office-hours smoke) office-hours auto-mode renders its mode question as `- **Building a startup**` markdown bullets (office-hours/SKILL.md.tmpl:102) with no letter/number marker. isProseAUQVisible only matched `A)`-style lettered or `1.`-style numbered options, so the question went undetected: the model surfaced it at ~2m19s (well under the 300s budget) but the harness kept scoring the run "working" off the spinner glyphs and timed out — a false timeout on a question that was already on screen. Add Pattern 3: when an interrogative line ('?') is present AND 3+ bold-bullet markers (`- **`) appear in the 4KB tail, classify as a prose AUQ. Bold is the discriminator vs incidental prose bullets; the line anchor is dropped (stripAnsi can collapse option lines) and the existing `❯ 1.` cursor gate still defers to a live native list. Wires through the existing classifyVisible 'asked' path and the timeout high-water-mark, so office-hours now classifies 'asked' instead of 'timeout'. Five unit cases: the office-hours render passes; no-'?', <3-bullet, plain-bullet, and native-cursor cases stay false. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(pty): detect stripAnsi-collapsed prose AUQs + judge spinner-precedence The plan-eng/plan-design plan-mode + finding-floor smokes timed out even when the skill HAD rendered a complete prose AskUserQuestion and was waiting: the PTY strips cursor-positioning escapes, collapsing the option newlines/spaces so "A) ..." arrives as "A(recommended)" / "-B:" and "Reply with A, B, or C" as "ReplywithA,B,orC". Every line-anchored detector (Patterns 1-3) returns false on those bytes, so proseAUQEverObserved never latched and the run timed out on a question that was already on screen. Add Pattern 4/5: a two-signal collapsed-form detector — a reply/recommendation marker (space-insensitive "reply with [A-D]", "Recommendation:", or "(recommended)") AND 2+ distinct A-D letters each punctuated by ) : or (. The conjunction is what separates a real AUQ from incidental report prose; verified true on the verbatim failing-run buffers where Patterns 1-3 return false. Also fix the Haiku judge spinner bias: of 614 verdicts, 569 were 'working' and 95 of those noted a question was visible — Claude Code keeps the spinner animating at an idle prose decision, so the judge coin-flipped. Add a precedence override: when an option list AND a Recommendation/Reply instruction are both visible, classify WAITING even with spinner glyphs. Kept the strict dual-signal gate (never option-list-alone) so auto-decide-preserved doesn't flip. 5 unit tests pin the two-signal contract (2 true on real collapsed bytes, 3 false guards). 90 -> 95 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(plan-review): ask-first scope gate for plan-eng + plan-design review On an empty/cold invocation, plan-eng-review and plan-design-review would dive straight into repo exploration (plan-eng) or a 7-pass mockup+audit (plan-design) and only ask the user much later, if at all. plan-ceo-review already asks first via an unconditional Step-0 gate and behaves well; these two did not. Add a hard-STOP scope gate as the FIRST operational instruction in each skill (above the design-doc check / pre-review audit / mockup defaults it explicitly overrides): the first tool call must be AskUserQuestion confirming the review target, before any git/Read/Grep/Glob/Bash or mockup generation. Under --disallowedTools the options render as plain column-0 lettered prose with a Recommendation + "Reply with A, B, or C" line so the answer is detectable. This is correct cold-start UX (confirm what to review before grinding a full review on nothing) and it is the product half of the plan-mode smoke fix; the harness collapsed-form detector is the deterministic half that catches the ask however it renders. Templates + regenerated SKILL.md (default variant). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(tiers): reclassify stochastic plan-eng/plan-design ask-first smokes as periodic plan-eng-review and plan-design-review run a long explore/audit before their first AskUserQuestion, so whether the plan-mode + finding-floor smokes reach a terminal outcome within the 300s/600s budget depends on stochastic ask-first compliance (measured ~50-67%/run even with the hardened gate). Per the "non-deterministic -> periodic" tiering rule, move the four affected smokes (plan-eng/plan-design review-plan-mode + finding-floor) to periodic. The deterministic harness fix (collapsed-form detector + judge precedence) and the ask-first gate lift these from always-failing to mostly-passing and are the real product+harness improvements; periodic monitoring tracks the rate weekly without blocking PRs on an LLM coin-flip. plan-ceo/plan-devex ask-first reliably and stay gate-tier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(evals): gate the deterministic PTY plan-mode smokes in CI The real-PTY plan-mode smokes never ran in CI — the gate was local-only. Add an e2e-pty-plan-smoke matrix suite running the two deterministically-reliable ones (office-hours-auto-mode, plan-mode-no-op) so a regression there blocks PRs. The stochastic plan-eng/plan-design ask-first smokes stay periodic (touchfiles E2E_TIERS) and are not CI-gated. A fresh CI container has no ~/.claude.json, so the spawned interactive `claude` would wedge on the onboarding + API-key-approval dialog. Add a scoped seed step (hasCompletedOnboarding + key approval, its own ANTHROPIC_API_KEY env) before the run — mirrors what the hermetic E2E child env seeds. Per-suite timeout override (35 min) via matrix.suite.timeout so the PTY suite has headroom for --retry 2 without bumping the other 12 suites. Report runner count 12 -> 13. Validate via workflow_dispatch before relying on the gate (PTY-in-CI is new). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(evals): install gstack skill registry for the PTY smoke suite The first dry-run of e2e-pty-plan-smoke failed: the spawned interactive `claude` printed "Unknown command: /plan-ceo-review". .claude/skills is gitignored, so a fresh CI checkout has no gstack skill registry and the TUI can't resolve /office-hours or /plan-ceo-review. Add a Register step (scoped to the suite, after Seed, before Run) that mirrors setup's --no-prefix user-scoped registry minimally: $HOME/.claude/skills/gstack -> repo (resolves the preambles' absolute ~/.claude/skills/gstack/bin/* and <skill>/sections/* paths) + per-skill SKILL.md/sections symlinks for the two skills these tests invoke. HOME is /github/home in this container and the runner adds no HOME/CLAUDE_CONFIG_DIR override (no hermetic mode), so $HOME is the right anchor — the Seed step already proved claude reads it. No ./setup (binary build + Chromium + fonts + /dev/tty prompt); SKILL.md + bin/ + sections/ are committed. Self-validating: fails the step loudly on a dangling symlink or missing `name:` frontmatter, so a moved target surfaces here instead of as a silent 35-min "Unknown command" timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v1.58.4.0) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
455 lines
19 KiB
TypeScript
455 lines
19 KiB
TypeScript
/**
|
||
* Unit tests for lib/redact-engine.ts + lib/redact-patterns.ts.
|
||
*
|
||
* One positive test per pattern, plus FP-filters, validators (Luhn/entropy/
|
||
* RFC1918), email allowlist, no-promotion visibility semantics, tool-fence
|
||
* degrade, normalization (zero-width / homoglyph / entity), oversize fail-closed,
|
||
* and pure-function purity.
|
||
*/
|
||
import { describe, test, expect } from "bun:test";
|
||
import {
|
||
scan,
|
||
exitCodeFor,
|
||
maskPreview,
|
||
normalizeWithMap,
|
||
redactFindingSpans,
|
||
type RepoVisibility,
|
||
} from "../lib/redact-engine";
|
||
import {
|
||
PATTERNS,
|
||
luhnValid,
|
||
shannonEntropy,
|
||
isPublicIPv4,
|
||
isPlaceholderSpan,
|
||
} from "../lib/redact-patterns";
|
||
|
||
function ids(text: string, vis: RepoVisibility = "private"): string[] {
|
||
return scan(text, { repoVisibility: vis }).findings.map((f) => f.id);
|
||
}
|
||
|
||
describe("HIGH credential patterns", () => {
|
||
const cases: Array<[string, string]> = [
|
||
["aws.access_key", "key = AKIA1234567890ABCDEF"],
|
||
["aws.secret_key", "aws_secret_access_key = AbCdEfGhIjKlMnOpQrStUvWxYz0123456789AbCd"],
|
||
["github.pat", "token ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz"],
|
||
["github.oauth", "gho_" + "1234567890abcdefghijklmnopqrstuvwxyz"],
|
||
["github.server", "ghs_1234567890abcdefghijklmnopqrstuvwxyz"],
|
||
["github.fine_grained", "github_pat_" + "A".repeat(82)],
|
||
["anthropic.key", "sk-ant-" + "api03-abcdefghij1234567890XYZ"],
|
||
["openai.key", "sk-proj-" + "a".repeat(40)],
|
||
["sendgrid.key", "SG." + "a".repeat(22) + "." + "b".repeat(43)],
|
||
["stripe.secret", "sk_live_" + "a".repeat(30)],
|
||
["slack.token", "xox" + "b-1234567890-abcdefghijklmnop"],
|
||
["slack.webhook", "https://hooks.slack.com/services/T00000000/B11111111/" + "a".repeat(24)],
|
||
["discord.webhook", "https://discord.com/api/webhooks/123456789012345678/" + "a".repeat(60)],
|
||
["pem.private_key", "-----BEGIN RSA PRIVATE KEY-----"],
|
||
// #1946 coverage-gap additions
|
||
["gitlab.token", "remote: glpat-" + "Ab12Cd34Ef56Gh78Ij90"],
|
||
["gitlab.token", "trigger glptt-" + "a1b2c3d4e5f6a7b8c9d0e1f2"],
|
||
["gitlab.token", "deploy gldt-" + "Zy98Xw76Vu54Ts32Rq10"],
|
||
["huggingface.token", "hf_" + "AbCdEfGhIjKlMnOpQrStUvWxYz012345"],
|
||
["npm.token", "npm_" + "a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8"],
|
||
["digitalocean.token", "dop_v1_" + "0123456789abcdef".repeat(4)],
|
||
[
|
||
"gcp.service_account",
|
||
'{"private_key_id": "abc123", "private_key": "-----BEGIN PRIVATE KEY-----\\nMIIE..."}',
|
||
],
|
||
];
|
||
for (const [id, text] of cases) {
|
||
test(`flags ${id}`, () => {
|
||
expect(ids(text)).toContain(id);
|
||
});
|
||
}
|
||
|
||
// #1868 — modern OpenAI keys use base64url bodies (with - and _). The old
|
||
// [A-Za-z0-9]{32,} regex stopped at the first separator and missed them all,
|
||
// failing a HIGH credential OPEN through the redaction gate.
|
||
test("openai.key flags modern sk-proj-/sk-svcacct-/sk-admin- shapes (#1868)", () => {
|
||
const missed = [
|
||
"sk-proj-Ab12_Cd34-Ef56Gh78Ij90Kl12Mn34Op56Qr78St90Uv",
|
||
"sk-svcacct-abc_def-ghijklmnopqrstuvwxyz0123456789ABCDEF",
|
||
"sk-admin-AAAA_BBBB-CCCC_DDDD-EEEE_FFFF-GGGG_HHHH1234",
|
||
];
|
||
for (const key of missed) {
|
||
expect(ids(`OPENAI_API_KEY=${key}`)).toContain("openai.key");
|
||
}
|
||
// legacy contiguous shape still flags
|
||
expect(ids("sk-proj-" + "a".repeat(40))).toContain("openai.key");
|
||
});
|
||
|
||
test("openai.key does not over-match prose / malformed sk- strings (#1868 calibration)", () => {
|
||
// HIGH tier BLOCKS, so false positives on prose are costly. None of these
|
||
// should flag as openai.key.
|
||
const benign = [
|
||
"the sk-learning-rate-schedule-was-tuned-carefully", // hyphenated prose
|
||
"sk--double-dash-typo-not-a-real-key",
|
||
"use sk-proj for the project prefix in docs", // no body
|
||
"sk-short", // too short, no prefix
|
||
];
|
||
for (const text of benign) {
|
||
expect(ids(text)).not.toContain("openai.key");
|
||
}
|
||
});
|
||
|
||
test("twilio.auth_token needs an SID nearby", () => {
|
||
const sid = "AC" + "a".repeat(32);
|
||
const tok = "b".repeat(32);
|
||
expect(ids(`account ${sid} token ${tok}`)).toContain("twilio.auth_token");
|
||
// bare 32-hex with no SID nearby should NOT flag as twilio
|
||
expect(ids(`random ${tok} here`)).not.toContain("twilio.auth_token");
|
||
});
|
||
|
||
test("db.url_with_password flags real password, skips placeholder/env-var", () => {
|
||
expect(ids("postgres://user:s3cretP@ss@db.example.com/app")).toContain("db.url_with_password");
|
||
expect(ids("postgres://user:${DB_PASSWORD}@host/app")).not.toContain("db.url_with_password");
|
||
});
|
||
|
||
test("all HIGH patterns block (exit 3)", () => {
|
||
const r = scan("AKIA1234567890ABCDEF", { repoVisibility: "private" });
|
||
expect(exitCodeFor(r)).toBe(3);
|
||
});
|
||
});
|
||
|
||
describe("MEDIUM demoted credential-shaped patterns (TENSION-1)", () => {
|
||
test("stripe.publishable is MEDIUM not HIGH", () => {
|
||
const f = scan("pk_live_" + "a".repeat(30), { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "stripe.publishable",
|
||
);
|
||
expect(f?.tier).toBe("MEDIUM");
|
||
});
|
||
test("google.api_key is MEDIUM", () => {
|
||
const f = scan("AIza" + "a".repeat(35), { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "google.api_key",
|
||
);
|
||
expect(f?.tier).toBe("MEDIUM");
|
||
});
|
||
test("jwt is MEDIUM", () => {
|
||
const jwt = "eyJhbGciOiJ.eyJzdWIiOiI." + "x".repeat(20);
|
||
const f = scan(jwt, { repoVisibility: "private" }).findings.find((x) => x.id === "jwt");
|
||
expect(f?.tier).toBe("MEDIUM");
|
||
});
|
||
test("env.kv fires on high-entropy, skips placeholder", () => {
|
||
expect(ids("API_TOKEN=8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJ")).toContain("env.kv");
|
||
expect(ids("API_KEY=changeme")).not.toContain("env.kv");
|
||
expect(ids("API_KEY=${MY_VAR}")).not.toContain("env.kv");
|
||
});
|
||
|
||
// #1946 — Bearer is the most FP-prone shape in the wave: docs and examples
|
||
// are full of "Authorization: Bearer <token>". MEDIUM + header proximity +
|
||
// the env.kv entropy recipe keep it calibrated.
|
||
test("auth.bearer fires on a high-entropy token in header context", () => {
|
||
const text = "curl -H 'Authorization: Bearer 8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq'";
|
||
const f = scan(text, { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "auth.bearer",
|
||
);
|
||
expect(f).toBeDefined();
|
||
expect(f?.tier).toBe("MEDIUM");
|
||
});
|
||
test("auth.bearer skips placeholders and env interpolations", () => {
|
||
expect(ids("Authorization: Bearer YOUR_TOKEN_HERE_PLACEHOLDER")).not.toContain("auth.bearer");
|
||
expect(ids("Authorization: Bearer ${ACCESS_TOKEN_FROM_ENV}")).not.toContain("auth.bearer");
|
||
});
|
||
test("auth.bearer requires header context (bare 'Bearer x' prose doesn't fire)", () => {
|
||
expect(ids("the Bearer 8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq walked in")).not.toContain(
|
||
"auth.bearer",
|
||
);
|
||
});
|
||
});
|
||
|
||
describe("#1946 pattern negatives (placeholders never fire)", () => {
|
||
test("short or placeholder shapes don't trip the new HIGH patterns", () => {
|
||
expect(ids("glpat-xxxx")).not.toContain("gitlab.token");
|
||
expect(ids("hf_token")).not.toContain("huggingface.token");
|
||
expect(ids("npm_install")).not.toContain("npm.token");
|
||
expect(ids("dop_v1_short")).not.toContain("digitalocean.token");
|
||
// pem header WITHOUT the GCP JSON shape stays pem.private_key only.
|
||
expect(ids("-----BEGIN PRIVATE KEY-----")).not.toContain("gcp.service_account");
|
||
});
|
||
});
|
||
|
||
describe("PII patterns", () => {
|
||
test("email flags + is autoRedactable", () => {
|
||
const f = scan("ping alice@corp.io please", { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "pii.email",
|
||
);
|
||
expect(f).toBeTruthy();
|
||
expect(f?.autoRedactable).toBe(true);
|
||
});
|
||
test("email allowlist: example.com, noreply, self, repo-public", () => {
|
||
expect(ids("see user@example.com")).not.toContain("pii.email");
|
||
expect(ids("from noreply@github.com")).not.toContain("pii.email");
|
||
expect(
|
||
scan("me@garry.dev", { repoVisibility: "private", selfEmail: "me@garry.dev" }).findings,
|
||
).toHaveLength(0);
|
||
expect(
|
||
scan("bob@acme.co", { repoVisibility: "private", repoPublicEmails: ["bob@acme.co"] }).findings,
|
||
).toHaveLength(0);
|
||
});
|
||
test("phone E.164", () => {
|
||
expect(ids("call +14155550123 now")).toContain("pii.phone.e164");
|
||
});
|
||
test("ssn flags valid, skips 000 octet", () => {
|
||
expect(ids("ssn 123-45-6789")).toContain("pii.ssn");
|
||
expect(ids("000-12-3456")).not.toContain("pii.ssn");
|
||
});
|
||
test("credit card needs Luhn", () => {
|
||
expect(ids("card 4111111111111111")).toContain("pii.cc");
|
||
expect(ids("num 4111111111111112")).not.toContain("pii.cc");
|
||
});
|
||
test("public IP flagged, RFC1918 skipped", () => {
|
||
expect(ids("connect 8.8.8.8")).toContain("pii.ip_public");
|
||
expect(ids("local 192.168.1.5")).not.toContain("pii.ip_public");
|
||
expect(ids("local 10.0.0.1")).not.toContain("pii.ip_public");
|
||
});
|
||
});
|
||
|
||
describe("internal + legal patterns", () => {
|
||
test("internal hostname", () => {
|
||
expect(ids("db1.corp internal host")).toContain("internal.hostname");
|
||
});
|
||
test("localhost url with path", () => {
|
||
expect(ids("hit http://localhost:8080/admin/secrets")).toContain("internal.url_private");
|
||
});
|
||
test("NDA marker", () => {
|
||
expect(ids("This is CONFIDENTIAL material")).toContain("legal.nda_marker");
|
||
});
|
||
test("named criticism needs a capitalized full name nearby", () => {
|
||
expect(ids("John Smith is incompetent at this")).toContain("legal.named_criticism");
|
||
expect(ids("the build is incompet019ently configured".replace("019", ""))).not.toContain(
|
||
"legal.named_criticism",
|
||
);
|
||
});
|
||
});
|
||
|
||
describe("LOW patterns surface only", () => {
|
||
test("user path is LOW", () => {
|
||
const f = scan("/Users/bob/secret/config", { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "internal.user_path",
|
||
);
|
||
expect(f?.tier).toBe("LOW");
|
||
});
|
||
test("TODO marker is LOW", () => {
|
||
const f = scan("TODO(alice) fix later", { repoVisibility: "private" }).findings.find(
|
||
(x) => x.id === "hygiene.todo",
|
||
);
|
||
expect(f?.tier).toBe("LOW");
|
||
});
|
||
});
|
||
|
||
describe("placeholder suppression (per-span)", () => {
|
||
test("AWS docs EXAMPLE key not flagged", () => {
|
||
expect(ids("AKIAIOSFODNN7EXAMPLE")).not.toContain("aws.access_key");
|
||
});
|
||
test("your_ prefix not flagged", () => {
|
||
expect(isPlaceholderSpan("your_api_key")).toBe(true);
|
||
});
|
||
test("a real secret on a line that ALSO contains EXAMPLE still flags", () => {
|
||
// line-based suppression would wrongly skip this; per-span must catch it.
|
||
expect(ids("# EXAMPLE usage\nkey AKIA1234567890ABCDEF")).toContain("aws.access_key");
|
||
});
|
||
});
|
||
|
||
describe("no visibility-based tier promotion (TENSION-2-followup)", () => {
|
||
test("email stays MEDIUM on both private and public", () => {
|
||
const priv = scan("x@corp.io", { repoVisibility: "private" }).findings[0];
|
||
const pub = scan("x@corp.io", { repoVisibility: "public" }).findings[0];
|
||
expect(priv.tier).toBe("MEDIUM");
|
||
expect(pub.tier).toBe("MEDIUM");
|
||
expect(pub.severity).toBe("MEDIUM"); // NOT promoted to HIGH
|
||
expect(pub.repoVisibility).toBe("public"); // recorded for sterner wording
|
||
});
|
||
test("demoted credential patterns stay MEDIUM on public", () => {
|
||
const pub = scan("pk_live_" + "a".repeat(30), { repoVisibility: "public" }).findings[0];
|
||
expect(pub.severity).toBe("MEDIUM");
|
||
});
|
||
test("unknown visibility treated as public for wording, still no promotion", () => {
|
||
const r = scan("x@corp.io", { repoVisibility: "unknown" });
|
||
expect(r.findings[0].severity).toBe("MEDIUM");
|
||
});
|
||
});
|
||
|
||
describe("tool-attributed fence WARN-degrade (TENSION-3)", () => {
|
||
test("placeholder-shaped credential in tool fence → WARN", () => {
|
||
const text = "```codex-review\nfound your_aws_key AKIAIOSFODNN7EXAMPLE in code\n```";
|
||
const r = scan(text, { repoVisibility: "private" });
|
||
// the EXAMPLE key is suppressed as placeholder; verify a non-credential note doesn't block
|
||
expect(r.counts.HIGH).toBe(0);
|
||
});
|
||
test("live-format credential in tool fence STILL blocks", () => {
|
||
const text = "```codex-review\nleaked AKIA1234567890ABCDEF here\n```";
|
||
const r = scan(text, { repoVisibility: "private" });
|
||
expect(r.counts.HIGH).toBe(1); // not degraded — live format
|
||
});
|
||
test("AKIA outside any fence blocks", () => {
|
||
expect(exitCodeFor(scan("AKIA1234567890ABCDEF", {}))).toBe(3);
|
||
});
|
||
});
|
||
|
||
describe("normalization", () => {
|
||
test("zero-width chars inside a key are stripped before matching", () => {
|
||
const zwsp = "";
|
||
const broken = "AKIA1234567890" + zwsp + "ABCDEF";
|
||
expect(ids(broken)).toContain("aws.access_key");
|
||
});
|
||
test("HTML entity decode", () => {
|
||
const { normalized } = normalizeWithMap("a & b");
|
||
expect(normalized).toBe("a & b");
|
||
});
|
||
test("offset map points back into original", () => {
|
||
const input = "xyz";
|
||
const { normalized, map } = normalizeWithMap(input);
|
||
expect(normalized).toBe("xyz");
|
||
// 'z' is at normalized index 2, original index 3
|
||
expect(map[2]).toBe(3);
|
||
});
|
||
});
|
||
|
||
describe("oversize fails CLOSED", () => {
|
||
test("input over the byte cap returns a single blocking HIGH finding", () => {
|
||
const big = "a".repeat(2000);
|
||
const r = scan(big, { maxBytes: 1000 });
|
||
expect(r.oversize).toBe(true);
|
||
expect(r.counts.HIGH).toBe(1);
|
||
expect(r.findings[0].id).toBe("engine.input_too_large");
|
||
expect(exitCodeFor(r)).toBe(3);
|
||
});
|
||
|
||
// #1824: a malformed --max-bytes used to reach the engine as NaN. `byteLen >
|
||
// NaN` is always false, silently disabling the fail-closed guard. The engine
|
||
// guardrail must fall back to the default cap for any non-finite / <= 0 value.
|
||
test("NaN maxBytes falls back to the default cap (does NOT disable the guard)", () => {
|
||
const big = "a".repeat(2 * 1024 * 1024); // > 1 MiB default cap
|
||
const r = scan(big, { maxBytes: NaN });
|
||
expect(r.oversize).toBe(true);
|
||
expect(r.findings[0].id).toBe("engine.input_too_large");
|
||
expect(exitCodeFor(r)).toBe(3);
|
||
});
|
||
|
||
test("negative / zero maxBytes falls back to the default cap", () => {
|
||
// negative would make `byteLen > -5` always true (block everything);
|
||
// the guardrail normalizes it to the default instead.
|
||
const small = "ok";
|
||
expect(scan(small, { maxBytes: -5 }).oversize).toBeFalsy();
|
||
expect(scan(small, { maxBytes: 0 }).oversize).toBeFalsy();
|
||
const big = "a".repeat(2 * 1024 * 1024);
|
||
expect(scan(big, { maxBytes: -5 }).oversize).toBe(true);
|
||
});
|
||
});
|
||
|
||
describe("validators", () => {
|
||
test("luhn", () => {
|
||
expect(luhnValid("4111111111111111")).toBe(true);
|
||
expect(luhnValid("4111111111111112")).toBe(false);
|
||
});
|
||
test("entropy", () => {
|
||
expect(shannonEntropy("aaaaaaaa")).toBeLessThan(1);
|
||
expect(shannonEntropy("8Fk2pQ9vXz4wL7mN")).toBeGreaterThan(3);
|
||
});
|
||
test("isPublicIPv4", () => {
|
||
expect(isPublicIPv4("8.8.8.8")).toBe(true);
|
||
expect(isPublicIPv4("10.1.2.3")).toBe(false);
|
||
expect(isPublicIPv4("172.16.5.5")).toBe(false);
|
||
expect(isPublicIPv4("999.1.1.1")).toBe(false);
|
||
});
|
||
});
|
||
|
||
describe("masking + purity", () => {
|
||
test("preview never leaks more than 4 leading chars", () => {
|
||
expect(maskPreview("AKIA1234567890ABCDEF")).toBe("AKIA********…");
|
||
expect(maskPreview("abc")).toBe("abc");
|
||
});
|
||
test("scan is pure — same input twice yields identical findings", () => {
|
||
const a = scan("AKIA1234567890ABCDEF x@corp.io", { repoVisibility: "public" });
|
||
const b = scan("AKIA1234567890ABCDEF x@corp.io", { repoVisibility: "public" });
|
||
expect(a).toEqual(b);
|
||
});
|
||
});
|
||
|
||
describe("redactFindingSpans — machine-egress masking (#1947)", () => {
|
||
test("clean input passes through unchanged", () => {
|
||
const text = "push failed: remote rejected the branch";
|
||
expect(redactFindingSpans(text, { repoVisibility: "private" })).toBe(text);
|
||
});
|
||
|
||
test("a single finding's span becomes <REDACTED-{id}>, context survives", () => {
|
||
const token = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
|
||
const out = redactFindingSpans(`auth ${token} rejected`, { repoVisibility: "private" });
|
||
expect(out).toBe("auth <REDACTED-github.pat> rejected");
|
||
});
|
||
|
||
test("multiple findings are all replaced (right-to-left splice keeps offsets valid)", () => {
|
||
const pat = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
|
||
const aws = "AKIA1234567890ABCDEF";
|
||
const out = redactFindingSpans(`first ${aws} then ${pat} end`, {
|
||
repoVisibility: "private",
|
||
});
|
||
expect(out).toBe("first <REDACTED-aws.access_key> then <REDACTED-github.pat> end");
|
||
});
|
||
|
||
test("fails closed (null) when a span cannot be relocated — never raw passthrough", () => {
|
||
// env.kv's span (the value) starts well past the regex match start (the
|
||
// var name), so locateSpan's rewind-2 re-exec misses it. The contract is
|
||
// null → caller drops the whole payload. The one thing that must never
|
||
// happen is the secret surviving in the output.
|
||
const secret = "8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq";
|
||
const out = redactFindingSpans(`API_KEY=${secret}`, { repoVisibility: "private" });
|
||
if (out !== null) {
|
||
// If locateSpan ever learns to find context-prefixed spans, masking
|
||
// must actually mask.
|
||
expect(out).not.toContain(secret);
|
||
} else {
|
||
expect(out).toBeNull();
|
||
}
|
||
});
|
||
|
||
test("multiline input redacts a finding past the first line (locateSpan line/col path)", () => {
|
||
const token = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
|
||
const out = redactFindingSpans(`line one\nline two has ${token}\nline three`, {
|
||
repoVisibility: "private",
|
||
});
|
||
expect(out).toBe("line one\nline two has <REDACTED-github.pat>\nline three");
|
||
});
|
||
|
||
// Pre-landing review CRITICAL: pem.private_key and gcp.service_account
|
||
// capture only the HEADER, not the key material — a span splice would
|
||
// redact the marker and forward the key body. Marker-only patterns must
|
||
// drop the whole payload.
|
||
test("PEM private key → null (header-only span must not forward the key body)", () => {
|
||
const msg =
|
||
"deploy failed: -----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASC\n-----END PRIVATE KEY-----";
|
||
expect(redactFindingSpans(msg, { repoVisibility: "private" })).toBeNull();
|
||
});
|
||
|
||
test("GCP service-account JSON → null (key body follows the captured marker)", () => {
|
||
const msg =
|
||
'config dump: {"private_key_id": "abc123", "private_key": "-----BEGIN PRIVATE KEY-----\\nMIIEvQIBADANBg..."}';
|
||
expect(redactFindingSpans(msg, { repoVisibility: "private" })).toBeNull();
|
||
});
|
||
|
||
// Pre-landing review: overlapping spans (a Bearer token that is also a
|
||
// JWT) must coalesce — independent splices apply stale offsets and can
|
||
// leave trailing secret bytes or mangled markers.
|
||
test("overlapping spans (Bearer JWT fires auth.bearer + jwt) never leak and produce clean markers", () => {
|
||
const jwt = "eyJ" + "a".repeat(20) + ".eyJ" + "b".repeat(20) + "." + "c".repeat(20);
|
||
const out = redactFindingSpans(`Authorization: Bearer ${jwt}`, { repoVisibility: "private" });
|
||
expect(out).not.toBeNull();
|
||
expect(out!).not.toContain("eyJ");
|
||
expect(out!).not.toContain("aaaa");
|
||
expect(out!).not.toContain("cccc");
|
||
// One coalesced, well-formed marker — no truncated fragments.
|
||
expect(out!).toMatch(/^Authorization: Bearer <REDACTED-[a-z._+]+>$/);
|
||
});
|
||
});
|
||
|
||
describe("taxonomy integrity", () => {
|
||
test("every pattern has a unique id", () => {
|
||
const set = new Set(PATTERNS.map((p) => p.id));
|
||
expect(set.size).toBe(PATTERNS.length);
|
||
});
|
||
test("autoRedactable patterns have a redactToken", () => {
|
||
for (const p of PATTERNS) {
|
||
if (p.autoRedactable) expect(p.redactToken).toBeTruthy();
|
||
}
|
||
});
|
||
});
|