Files
gstack/test/redact-engine.test.ts
T
Garry Tan 9fd03fae9e v1.58.4.0 fix: high-priority community bug wave + PTY plan-mode smoke gate (#2077)
* fix(gbrain): stop forcing GBRAIN_PREPARE on transaction-mode poolers (#1965)

buildGbrainEnv auto-set GBRAIN_PREPARE=true whenever DATABASE_URL targeted
port 6543, and the /sync-gbrain capability check exported it for the rest
of the skill run. Both had the semantics inverted: gbrain auto-disables
prepared statements on transaction-mode poolers because they break every
write there ("prepared statement does not exist"); GBRAIN_PREPARE=true is
gbrain's documented override for SESSION-mode poolers on 6543, not a
requirement for transaction mode. The #1435 search symptom the auto-set
worked around was fixed gbrain-side.

Remove both force-sets. A caller-set GBRAIN_PREPARE (either value) still
passes through untouched, preserving the session-mode-on-6543 escape hatch.
isTransactionModePooler stays exported.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(gbrain): classify probe timeout as its own status; sync proceeds instead of skipping (#1964)

The 5s engine probe misclassified healthy-but-slow engines (cold Supabase
pooler connections measured at 6.9-10.7s) as broken-config, so /sync-gbrain
silently skipped code+memory and told the user their config was malformed.

- New "timeout" status: probe killed at the deadline with no recognized
  stderr pattern. Default deadline is now 15s, overridable via
  GSTACK_GBRAIN_PROBE_TIMEOUT_MS (tests set 300ms against a fake that
  sleeps 2s).
- Sync stages PROCEED on timeout with a stderr warning naming the env knob;
  a genuinely-dead engine surfaces its real error at the first operation
  instead of a false config diagnosis.
- Consistency everywhere "ok" gated behavior: gstack-gbrain-detect --is-ok
  exits 0 on timeout, and gen-skill-docs' detection gate accepts it, so a
  slow engine no longer silently suppresses brain-aware features.
- Status cache: key now includes the effective probe timeout (raising it
  invalidates a cached timeout) and GBRAIN_HOME; config detection honors
  GBRAIN_HOME so relocated-home users stop being misclassified as
  missing-config.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(bins): cygpath-normalize SCRIPT_DIR for bun imports; surface learnings-log errors (#1950)

Under Windows git-bash, pwd yields a POSIX path (/c/Users/...) that Bun on
Windows cannot resolve as an ES module specifier. gstack-learnings-log
interpolates SCRIPT_DIR into a bun -e import, so every invocation died with
"Cannot find module" — and 2>/dev/null swallowed the error, silently
dropping every AI-logged learning for Windows users.

- 3-line cygpath -m guard in gstack-learnings-log and gstack-question-log
  (which gains the same import shape in the next commit). Matches the
  duplicated IS_WINDOWS convention in setup; no shared shell lib exists.
- learnings-log adopts question-log's set +e / TMPERR capture pattern
  wholesale: validation errors now print to stderr. The old
  `if [ $? -ne 0 ]` check was dead code under set -euo pipefail — the
  script exited at the failing assignment before reaching it.
- New test/bin-windows-bun-import-paths.test.ts: static invariant (any
  bash bin interpolating $SCRIPT_DIR into a bun -e import must carry the
  guard) + behavioral end-to-end run invoked via `bash <bin>` — added to
  the windows-free-tests workflow list so the conversion is proven on the
  only platform where the bug exists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(question-log): dedupe INJECTION_PATTERNS via lib/jsonl-store (#1934)

bin/gstack-question-log carried a local copy of the injection-pattern list,
so pattern fixes to lib/jsonl-store.ts never propagated — including the
/override[:\s]/i false-positive fix arriving via community PR #1940.
Import the shared hasInjection instead (enabled by the previous commit's
cygpath guard). question-log also gets the lib's stricter superset
(human:, disregard, from-now-on, approve-all patterns).

Tests pin the contract in a #1940-order-independent way: an "Override:
ignore all previous instructions" header is rejected, "prose overrides the
deterministic table" is accepted, and a static invariant keeps local
INJECTION_PATTERNS duplicates out of the bin.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(security): community-pulse + both dashboards never report fake zeros (#1947)

The security-signaling surface failed open at three layers — every failure
mode read as a reassuring "0 attacks" / "0 installs":

- community-pulse edge function: supabase-js returns {data,error} without
  throwing, and all five queries discarded `error` — a DB outage produced
  real-looking zeros via the SUCCESS path, and the catch (also returning
  zeros with HTTP 200) was unreachable for query failures. Every query now
  destructures and throws; the catch serves the stale cache (marked
  "stale": true) when one exists, else 503 {"error":"pulse_unavailable"}.
  Success responses carry "status":"ok" so clients can distinguish
  authoritative data from legacy backends. NOTE: the edge function deploys
  out-of-band (supabase functions deploy community-pulse).
- gstack-security-dashboard: captures the HTTP status; non-200 / network
  failure / error body / missing section → "unknown — backend error";
  jq missing → "unknown — install jq" (the lossy grep fallback broke on
  nested arrays and under-reported attacks as zero — removed); a 200
  without the new marker shows figures with an "unverified (legacy
  backend)" note. Also fixes a latent display bug: the TOTAL grep matched
  the digit 7 inside "attacks_last_7_days" and misreported every count.
- gstack-community-dashboard: same class — curl || echo "{}" plus
  grep || echo "0" printed "Weekly active installs: 0" on any failure.
  Now "unknown — backend error (HTTP N)".

test/security-dashboard-fallback.test.ts pins the matrix (200+marker,
200-legacy, 503, network failure) x (jq present, jq absent) for both bins:
"unknown" states never render as 0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(telemetry): redact error_message spans before they leave the machine (#1947)

error_message was uploaded with only quote/newline escaping — stack traces
and failed-API errors can embed credentials, private paths, and hostnames,
and the sync path strips only _repo_slug/_branch.

New lib/redact-engine.ts export redactFindingSpans(): replaces EVERY
finding's span with <REDACTED-{id}> regardless of tier (applyRedactions is
the interactive PII-only path and exits nonzero on credential findings, so
it can't serve machine egress). Returns null when a span can't be located —
callers drop the whole payload rather than risk a leak.

gstack-telemetry-log pipes error_message through it at LOG time, so the
local JSONL at rest is clean too; surrounding text survives for crash
triage. FAIL CLOSED: bun missing, engine error, or non-JSON-string output
all null the field. Tests pin: embedded ghp_ token → <REDACTED-github.pat>
with context intact; redactor unavailable → null; raw bytes on disk never
contain the token.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(redact): prepush guard fails closed on git failure; /ship owns hook install (#1946)

Two gaps closed:

1. Fail closed. The git() helper returned "" on ANY non-zero exit or
   maxBuffer overflow (status null), addedLinesFor produced an empty
   string, and the push sailed through unscanned — fail-open on exactly
   the oversized-diff case where a large secret-bearing blob is most
   likely. The diff call now uses a strict variant that throws; main
   blocks with a clear message naming the GSTACK_REDACT_PREPUSH=skip
   escape valve. Probe calls (symbolic-ref, rev-parse, merge-base) keep
   the permissive helper — their failures are normal control flow.

2. Install path. The hook was installed by nothing ("opt-in, installed by
   nothing" was the issue's words). ./setup runs in the gstack checkout —
   the wrong repo for a per-project hook — so it gets a one-line hint
   only. /ship owns per-repo install: config redact_prepush_hook=true +
   hook missing → silent install (consent already given); config unset +
   no ~/.gstack/.redact-prepush-prompted marker → one-time machine-wide
   AskUserQuestion offer, answer persisted. ship/SKILL.md regenerated in
   this same commit (check-freshness bisect discipline).

Tests: unscannable diff (bogus SHAs) → exit 1 + valve named; empty-but-
successful diff → exit 0; static asserts pin setup as hint-only and the
ship template as the installer surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(redact): six new credential patterns — GitLab, HuggingFace, npm, DigitalOcean, Bearer, GCP SA (#1946)

Coverage gaps from the #1946 security review, including token types for
tooling gstack itself drives (glab):

HIGH (block): gitlab.token (glpat-/glptt-/gldt-), huggingface.token (hf_),
npm.token (npm_), digitalocean.token (dop_v1_), gcp.service_account (the
JSON-escaped "private_key" form that dodges pem.private_key's literal-block
match when minified, confirmed by "private_key_id" proximity).

MEDIUM (warn): auth.bearer — the most FP-prone shape in the set (docs are
full of "Authorization: Bearer <token>"), so it requires header-context
proximity and the same entropy>=3.0 + placeholder validator recipe as
env.kv. "Bearer YOUR_TOKEN_HERE" never fires; calibration over coverage,
per the cries-wolf principle.

All shapes are linear-time; test/redact-pattern-lint.test.ts covers them
automatically. Engine tests add positive + placeholder-negative cases per
pattern.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: coverage-audit additions for the fix wave

Ship Step 7 gap-fill (all passing, 248 tests across the touched suites):
memory + dream stage probe-timeout proceeds, gbrain-detect override paths,
stale-flag passthrough, 200-body-missing-.security fail-closed case,
telemetry redaction edges, and credential-pattern edge cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: pre-landing review fixes

Review army findings (1 critical, auto-fixed with regression tests):

- CRITICAL (security specialist, verified live): redactFindingSpans spliced
  only the regex capture span, and pem.private_key / gcp.service_account
  capture just the BEGIN-header — the key body survived "redaction" and
  shipped via telemetry. Marker-only patterns now drop the whole payload
  (null, fail closed). Overlapping spans (Bearer+JWT on the same bytes) are
  coalesced before splicing so stale offsets can't leave partial secret
  bytes behind.
- gitStrict: drop the dead `|| r.status === null` disjunct (null !== 0
  already covers it); add the signal-kill/null-status regression test the
  docstring promised.
- security-dashboard human mode flags stale snapshots ("figures may be out
  of date") instead of presenting frozen counts as current.
- community-dashboard marker check uses jq when available — the grep-only
  variant misclassified whitespaced/reserialized bodies as legacy.
- telemetry fail-closed test now shadows bun with a failing stub
  (deterministic on any host layout); stale "five status cases" describe
  title renamed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: adversarial review fixes (Claude + Codex cross-model passes)

Both adversarial passes ran against the wave; every FIXABLE finding landed
with a regression test:

- probeTimeoutMs clamps to >=1ms: a fractional override floored to 0, and
  execFileSync treats timeout:0 as NO timeout — the probe that exists to
  bound hangs could hang forever (found by both models independently).
- /ship silent hook install now requires the hooks dir to live inside
  .git: with core.hooksPath (husky's COMMITTED .husky/), the chaining
  installer would have renamed the team's committed pre-push and written a
  machine-local wrapper into the working tree (found by both models).
- gstack-config gbrain-refresh accepts the "timeout" status — the last
  consumer still gating on literal "ok" (Codex); gstack-gbrain-detect's
  config-derived fields honor GBRAIN_HOME so the detection JSON can't
  report status ok alongside config_exists false (Codex).
- prepush: a remote sha absent locally (shallow clone / stale fetch) falls
  back to the merge-base/empty-tree range — scans MORE, never blocks a
  legitimate push into training users toward --no-verify.
- dashboards: curl's own 000 no longer doubles to "HTTP 000000"; the
  community dashboard flags stale snapshots like the security one; array
  sections parse via jq (the sed/grep loops truncated at the first ']');
  the no-jq marker grep tolerates whitespace.
- telemetry: multi-line redactor output nulls the field instead of
  corrupting the JSONL record; setup's hint fires only when the config key
  is genuinely unset (an explicit false is a recorded decline); the /ship
  prompt marker honors GSTACK_HOME.

Kept as designed (cross-model tension noted): Bearer stays MEDIUM in the
prepush gate — a HIGH Bearer would block every docs example; the entropy
validator can't eliminate that FP class, and MEDIUM warns visibly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.11.0)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* docs: P1 TODO — eval harness live progress + incremental persistence

Root-caused during this ship: a killed eval run was indistinguishable from a
healthy one for hours (per-file output buffering across mega test files, no
incremental eval-store writes, no honest liveness signal). Full context and
starting points in the entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: fix operational-learning E2E fixture — copy lib/jsonl-store.ts

Pre-existing breakage, proven on main: gstack-learnings-log has imported
lib/jsonl-store.ts (shared injection patterns) since v1.57.5.0 / #1910, but
the fixture copies only the bin scripts — the bin exits 1 before writing
anything, on main silently (stderr swallowed) and on this branch loudly
(the #1950 error-surfacing made the four-day-old failure visible). A real
install always ships bin/ and lib/ together; the fixture now does too.
Verified: the fixture-shaped invocation writes the learning (exit 0) with
lib present, exits 1 on both main and this branch without it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(ios-qa): isolate E2E tests under --concurrent (3 real races)

The ios-qa E2E file failed intermittently under `bun test --concurrent`
(the eval harness default). Three distinct shared-state races, all fixed:

1. Shared pidfile: a module-level `workDir` reassigned in beforeEach was
   clobbered by parallel tests, so concurrent daemons collided on the same
   pidfile and the loser returned `already_running`. Each test now gets its
   own dir via makeWorkDir().
2. process.env path globals: tests set GSTACK_IOS_AUDIT_PATH /
   _ATTEMPTS_PATH / _ALLOWLIST_PATH on the shared process env; concurrent
   tests stomped each other's audit/attempts destinations. Threaded
   auditPath/attemptsPath/allowlistPath through DaemonOptions (and
   mintForCaller) as explicit args — env is no longer load-bearing.
3. afterEach cleanup race: the per-test cleanup drained a shared dir array,
   so the first test to finish deleted still-running tests' workDirs
   mid-assertion. Moved to afterAll (cleans once, after all settle).

Verified: 5/5 clean full-suite runs at --max-concurrency 15 (was
intermittent); daemon unit suite 91/91; daemon source compiles. The paths
default to the env-derived locations when options are omitted, so the
production CLI path is unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test(pty): pin spawned claude to EVALS model chain (default claude-sonnet-4-6)

launchClaudePty spawned the interactive `claude` TUI with no --model flag, so
the child inherited the operator's ~/.claude/settings.json model. On a
slow-thinking model that meant 5+ min of extended thinking on empty plan-mode
context, timing out the plan-mode smoke tests regardless of contention. Pin the
model via opts.model ?? EVALS_MODEL ?? 'claude-sonnet-4-6' — byte-identical to
session-runner.ts:144, so PTY and `claude -p` evals always agree.

Pushed before extraArgs (last flag wins, so a per-test --model still overrides).
Placement leaves the spawn region byte-stable for a clean merge with the
in-flight hermetic-env branch. Plumbed model through the three plan-skill
wrappers. Static-grep tripwires guard the pin, its fallback chain, the
before-extraArgs ordering, and all three wrapper forwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect markdown bold-bullet prose AUQs (fixes office-hours smoke)

office-hours auto-mode renders its mode question as `- **Building a startup**`
markdown bullets (office-hours/SKILL.md.tmpl:102) with no letter/number marker.
isProseAUQVisible only matched `A)`-style lettered or `1.`-style numbered
options, so the question went undetected: the model surfaced it at ~2m19s
(well under the 300s budget) but the harness kept scoring the run "working"
off the spinner glyphs and timed out — a false timeout on a question that was
already on screen.

Add Pattern 3: when an interrogative line ('?') is present AND 3+ bold-bullet
markers (`- **`) appear in the 4KB tail, classify as a prose AUQ. Bold is the
discriminator vs incidental prose bullets; the line anchor is dropped (stripAnsi
can collapse option lines) and the existing `❯ 1.` cursor gate still defers to a
live native list. Wires through the existing classifyVisible 'asked' path and the
timeout high-water-mark, so office-hours now classifies 'asked' instead of
'timeout'. Five unit cases: the office-hours render passes; no-'?', <3-bullet,
plain-bullet, and native-cursor cases stay false.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(pty): detect stripAnsi-collapsed prose AUQs + judge spinner-precedence

The plan-eng/plan-design plan-mode + finding-floor smokes timed out even when
the skill HAD rendered a complete prose AskUserQuestion and was waiting: the PTY
strips cursor-positioning escapes, collapsing the option newlines/spaces so
"A) ..." arrives as "A(recommended)" / "-B:" and "Reply with A, B, or C" as
"ReplywithA,B,orC". Every line-anchored detector (Patterns 1-3) returns false on
those bytes, so proseAUQEverObserved never latched and the run timed out on a
question that was already on screen.

Add Pattern 4/5: a two-signal collapsed-form detector — a reply/recommendation
marker (space-insensitive "reply with [A-D]", "Recommendation:", or
"(recommended)") AND 2+ distinct A-D letters each punctuated by ) : or (. The
conjunction is what separates a real AUQ from incidental report prose; verified
true on the verbatim failing-run buffers where Patterns 1-3 return false.

Also fix the Haiku judge spinner bias: of 614 verdicts, 569 were 'working' and
95 of those noted a question was visible — Claude Code keeps the spinner
animating at an idle prose decision, so the judge coin-flipped. Add a precedence
override: when an option list AND a Recommendation/Reply instruction are both
visible, classify WAITING even with spinner glyphs. Kept the strict dual-signal
gate (never option-list-alone) so auto-decide-preserved doesn't flip.

5 unit tests pin the two-signal contract (2 true on real collapsed bytes, 3
false guards). 90 -> 95 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(plan-review): ask-first scope gate for plan-eng + plan-design review

On an empty/cold invocation, plan-eng-review and plan-design-review would dive
straight into repo exploration (plan-eng) or a 7-pass mockup+audit (plan-design)
and only ask the user much later, if at all. plan-ceo-review already asks first
via an unconditional Step-0 gate and behaves well; these two did not.

Add a hard-STOP scope gate as the FIRST operational instruction in each skill
(above the design-doc check / pre-review audit / mockup defaults it explicitly
overrides): the first tool call must be AskUserQuestion confirming the review
target, before any git/Read/Grep/Glob/Bash or mockup generation. Under
--disallowedTools the options render as plain column-0 lettered prose with a
Recommendation + "Reply with A, B, or C" line so the answer is detectable.

This is correct cold-start UX (confirm what to review before grinding a full
review on nothing) and it is the product half of the plan-mode smoke fix; the
harness collapsed-form detector is the deterministic half that catches the ask
however it renders. Templates + regenerated SKILL.md (default variant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(tiers): reclassify stochastic plan-eng/plan-design ask-first smokes as periodic

plan-eng-review and plan-design-review run a long explore/audit before their
first AskUserQuestion, so whether the plan-mode + finding-floor smokes reach a
terminal outcome within the 300s/600s budget depends on stochastic ask-first
compliance (measured ~50-67%/run even with the hardened gate). Per the
"non-deterministic -> periodic" tiering rule, move the four affected smokes
(plan-eng/plan-design review-plan-mode + finding-floor) to periodic.

The deterministic harness fix (collapsed-form detector + judge precedence) and
the ask-first gate lift these from always-failing to mostly-passing and are the
real product+harness improvements; periodic monitoring tracks the rate weekly
without blocking PRs on an LLM coin-flip. plan-ceo/plan-devex ask-first reliably
and stay gate-tier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): gate the deterministic PTY plan-mode smokes in CI

The real-PTY plan-mode smokes never ran in CI — the gate was local-only. Add an
e2e-pty-plan-smoke matrix suite running the two deterministically-reliable ones
(office-hours-auto-mode, plan-mode-no-op) so a regression there blocks PRs. The
stochastic plan-eng/plan-design ask-first smokes stay periodic (touchfiles
E2E_TIERS) and are not CI-gated.

A fresh CI container has no ~/.claude.json, so the spawned interactive `claude`
would wedge on the onboarding + API-key-approval dialog. Add a scoped seed step
(hasCompletedOnboarding + key approval, its own ANTHROPIC_API_KEY env) before the
run — mirrors what the hermetic E2E child env seeds. Per-suite timeout override
(35 min) via matrix.suite.timeout so the PTY suite has headroom for --retry 2
without bumping the other 12 suites. Report runner count 12 -> 13.

Validate via workflow_dispatch before relying on the gate (PTY-in-CI is new).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ci(evals): install gstack skill registry for the PTY smoke suite

The first dry-run of e2e-pty-plan-smoke failed: the spawned interactive `claude`
printed "Unknown command: /plan-ceo-review". .claude/skills is gitignored, so a
fresh CI checkout has no gstack skill registry and the TUI can't resolve
/office-hours or /plan-ceo-review.

Add a Register step (scoped to the suite, after Seed, before Run) that mirrors
setup's --no-prefix user-scoped registry minimally: $HOME/.claude/skills/gstack
-> repo (resolves the preambles' absolute ~/.claude/skills/gstack/bin/* and
<skill>/sections/* paths) + per-skill SKILL.md/sections symlinks for the two
skills these tests invoke. HOME is /github/home in this container and the runner
adds no HOME/CLAUDE_CONFIG_DIR override (no hermetic mode), so $HOME is the right
anchor — the Seed step already proved claude reads it. No ./setup (binary build
+ Chromium + fonts + /dev/tty prompt); SKILL.md + bin/ + sections/ are committed.

Self-validating: fails the step loudly on a dangling symlink or missing
`name:` frontmatter, so a moved target surfaces here instead of as a silent
35-min "Unknown command" timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.58.4.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-21 07:15:19 -07:00

455 lines
19 KiB
TypeScript
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
/**
* Unit tests for lib/redact-engine.ts + lib/redact-patterns.ts.
*
* One positive test per pattern, plus FP-filters, validators (Luhn/entropy/
* RFC1918), email allowlist, no-promotion visibility semantics, tool-fence
* degrade, normalization (zero-width / homoglyph / entity), oversize fail-closed,
* and pure-function purity.
*/
import { describe, test, expect } from "bun:test";
import {
scan,
exitCodeFor,
maskPreview,
normalizeWithMap,
redactFindingSpans,
type RepoVisibility,
} from "../lib/redact-engine";
import {
PATTERNS,
luhnValid,
shannonEntropy,
isPublicIPv4,
isPlaceholderSpan,
} from "../lib/redact-patterns";
function ids(text: string, vis: RepoVisibility = "private"): string[] {
return scan(text, { repoVisibility: vis }).findings.map((f) => f.id);
}
describe("HIGH credential patterns", () => {
const cases: Array<[string, string]> = [
["aws.access_key", "key = AKIA1234567890ABCDEF"],
["aws.secret_key", "aws_secret_access_key = AbCdEfGhIjKlMnOpQrStUvWxYz0123456789AbCd"],
["github.pat", "token ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz"],
["github.oauth", "gho_" + "1234567890abcdefghijklmnopqrstuvwxyz"],
["github.server", "ghs_1234567890abcdefghijklmnopqrstuvwxyz"],
["github.fine_grained", "github_pat_" + "A".repeat(82)],
["anthropic.key", "sk-ant-" + "api03-abcdefghij1234567890XYZ"],
["openai.key", "sk-proj-" + "a".repeat(40)],
["sendgrid.key", "SG." + "a".repeat(22) + "." + "b".repeat(43)],
["stripe.secret", "sk_live_" + "a".repeat(30)],
["slack.token", "xox" + "b-1234567890-abcdefghijklmnop"],
["slack.webhook", "https://hooks.slack.com/services/T00000000/B11111111/" + "a".repeat(24)],
["discord.webhook", "https://discord.com/api/webhooks/123456789012345678/" + "a".repeat(60)],
["pem.private_key", "-----BEGIN RSA PRIVATE KEY-----"],
// #1946 coverage-gap additions
["gitlab.token", "remote: glpat-" + "Ab12Cd34Ef56Gh78Ij90"],
["gitlab.token", "trigger glptt-" + "a1b2c3d4e5f6a7b8c9d0e1f2"],
["gitlab.token", "deploy gldt-" + "Zy98Xw76Vu54Ts32Rq10"],
["huggingface.token", "hf_" + "AbCdEfGhIjKlMnOpQrStUvWxYz012345"],
["npm.token", "npm_" + "a1B2c3D4e5F6g7H8i9J0k1L2m3N4o5P6q7R8"],
["digitalocean.token", "dop_v1_" + "0123456789abcdef".repeat(4)],
[
"gcp.service_account",
'{"private_key_id": "abc123", "private_key": "-----BEGIN PRIVATE KEY-----\\nMIIE..."}',
],
];
for (const [id, text] of cases) {
test(`flags ${id}`, () => {
expect(ids(text)).toContain(id);
});
}
// #1868 — modern OpenAI keys use base64url bodies (with - and _). The old
// [A-Za-z0-9]{32,} regex stopped at the first separator and missed them all,
// failing a HIGH credential OPEN through the redaction gate.
test("openai.key flags modern sk-proj-/sk-svcacct-/sk-admin- shapes (#1868)", () => {
const missed = [
"sk-proj-Ab12_Cd34-Ef56Gh78Ij90Kl12Mn34Op56Qr78St90Uv",
"sk-svcacct-abc_def-ghijklmnopqrstuvwxyz0123456789ABCDEF",
"sk-admin-AAAA_BBBB-CCCC_DDDD-EEEE_FFFF-GGGG_HHHH1234",
];
for (const key of missed) {
expect(ids(`OPENAI_API_KEY=${key}`)).toContain("openai.key");
}
// legacy contiguous shape still flags
expect(ids("sk-proj-" + "a".repeat(40))).toContain("openai.key");
});
test("openai.key does not over-match prose / malformed sk- strings (#1868 calibration)", () => {
// HIGH tier BLOCKS, so false positives on prose are costly. None of these
// should flag as openai.key.
const benign = [
"the sk-learning-rate-schedule-was-tuned-carefully", // hyphenated prose
"sk--double-dash-typo-not-a-real-key",
"use sk-proj for the project prefix in docs", // no body
"sk-short", // too short, no prefix
];
for (const text of benign) {
expect(ids(text)).not.toContain("openai.key");
}
});
test("twilio.auth_token needs an SID nearby", () => {
const sid = "AC" + "a".repeat(32);
const tok = "b".repeat(32);
expect(ids(`account ${sid} token ${tok}`)).toContain("twilio.auth_token");
// bare 32-hex with no SID nearby should NOT flag as twilio
expect(ids(`random ${tok} here`)).not.toContain("twilio.auth_token");
});
test("db.url_with_password flags real password, skips placeholder/env-var", () => {
expect(ids("postgres://user:s3cretP@ss@db.example.com/app")).toContain("db.url_with_password");
expect(ids("postgres://user:${DB_PASSWORD}@host/app")).not.toContain("db.url_with_password");
});
test("all HIGH patterns block (exit 3)", () => {
const r = scan("AKIA1234567890ABCDEF", { repoVisibility: "private" });
expect(exitCodeFor(r)).toBe(3);
});
});
describe("MEDIUM demoted credential-shaped patterns (TENSION-1)", () => {
test("stripe.publishable is MEDIUM not HIGH", () => {
const f = scan("pk_live_" + "a".repeat(30), { repoVisibility: "private" }).findings.find(
(x) => x.id === "stripe.publishable",
);
expect(f?.tier).toBe("MEDIUM");
});
test("google.api_key is MEDIUM", () => {
const f = scan("AIza" + "a".repeat(35), { repoVisibility: "private" }).findings.find(
(x) => x.id === "google.api_key",
);
expect(f?.tier).toBe("MEDIUM");
});
test("jwt is MEDIUM", () => {
const jwt = "eyJhbGciOiJ.eyJzdWIiOiI." + "x".repeat(20);
const f = scan(jwt, { repoVisibility: "private" }).findings.find((x) => x.id === "jwt");
expect(f?.tier).toBe("MEDIUM");
});
test("env.kv fires on high-entropy, skips placeholder", () => {
expect(ids("API_TOKEN=8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJ")).toContain("env.kv");
expect(ids("API_KEY=changeme")).not.toContain("env.kv");
expect(ids("API_KEY=${MY_VAR}")).not.toContain("env.kv");
});
// #1946 — Bearer is the most FP-prone shape in the wave: docs and examples
// are full of "Authorization: Bearer <token>". MEDIUM + header proximity +
// the env.kv entropy recipe keep it calibrated.
test("auth.bearer fires on a high-entropy token in header context", () => {
const text = "curl -H 'Authorization: Bearer 8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq'";
const f = scan(text, { repoVisibility: "private" }).findings.find(
(x) => x.id === "auth.bearer",
);
expect(f).toBeDefined();
expect(f?.tier).toBe("MEDIUM");
});
test("auth.bearer skips placeholders and env interpolations", () => {
expect(ids("Authorization: Bearer YOUR_TOKEN_HERE_PLACEHOLDER")).not.toContain("auth.bearer");
expect(ids("Authorization: Bearer ${ACCESS_TOKEN_FROM_ENV}")).not.toContain("auth.bearer");
});
test("auth.bearer requires header context (bare 'Bearer x' prose doesn't fire)", () => {
expect(ids("the Bearer 8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq walked in")).not.toContain(
"auth.bearer",
);
});
});
describe("#1946 pattern negatives (placeholders never fire)", () => {
test("short or placeholder shapes don't trip the new HIGH patterns", () => {
expect(ids("glpat-xxxx")).not.toContain("gitlab.token");
expect(ids("hf_token")).not.toContain("huggingface.token");
expect(ids("npm_install")).not.toContain("npm.token");
expect(ids("dop_v1_short")).not.toContain("digitalocean.token");
// pem header WITHOUT the GCP JSON shape stays pem.private_key only.
expect(ids("-----BEGIN PRIVATE KEY-----")).not.toContain("gcp.service_account");
});
});
describe("PII patterns", () => {
test("email flags + is autoRedactable", () => {
const f = scan("ping alice@corp.io please", { repoVisibility: "private" }).findings.find(
(x) => x.id === "pii.email",
);
expect(f).toBeTruthy();
expect(f?.autoRedactable).toBe(true);
});
test("email allowlist: example.com, noreply, self, repo-public", () => {
expect(ids("see user@example.com")).not.toContain("pii.email");
expect(ids("from noreply@github.com")).not.toContain("pii.email");
expect(
scan("me@garry.dev", { repoVisibility: "private", selfEmail: "me@garry.dev" }).findings,
).toHaveLength(0);
expect(
scan("bob@acme.co", { repoVisibility: "private", repoPublicEmails: ["bob@acme.co"] }).findings,
).toHaveLength(0);
});
test("phone E.164", () => {
expect(ids("call +14155550123 now")).toContain("pii.phone.e164");
});
test("ssn flags valid, skips 000 octet", () => {
expect(ids("ssn 123-45-6789")).toContain("pii.ssn");
expect(ids("000-12-3456")).not.toContain("pii.ssn");
});
test("credit card needs Luhn", () => {
expect(ids("card 4111111111111111")).toContain("pii.cc");
expect(ids("num 4111111111111112")).not.toContain("pii.cc");
});
test("public IP flagged, RFC1918 skipped", () => {
expect(ids("connect 8.8.8.8")).toContain("pii.ip_public");
expect(ids("local 192.168.1.5")).not.toContain("pii.ip_public");
expect(ids("local 10.0.0.1")).not.toContain("pii.ip_public");
});
});
describe("internal + legal patterns", () => {
test("internal hostname", () => {
expect(ids("db1.corp internal host")).toContain("internal.hostname");
});
test("localhost url with path", () => {
expect(ids("hit http://localhost:8080/admin/secrets")).toContain("internal.url_private");
});
test("NDA marker", () => {
expect(ids("This is CONFIDENTIAL material")).toContain("legal.nda_marker");
});
test("named criticism needs a capitalized full name nearby", () => {
expect(ids("John Smith is incompetent at this")).toContain("legal.named_criticism");
expect(ids("the build is incompet019ently configured".replace("019", ""))).not.toContain(
"legal.named_criticism",
);
});
});
describe("LOW patterns surface only", () => {
test("user path is LOW", () => {
const f = scan("/Users/bob/secret/config", { repoVisibility: "private" }).findings.find(
(x) => x.id === "internal.user_path",
);
expect(f?.tier).toBe("LOW");
});
test("TODO marker is LOW", () => {
const f = scan("TODO(alice) fix later", { repoVisibility: "private" }).findings.find(
(x) => x.id === "hygiene.todo",
);
expect(f?.tier).toBe("LOW");
});
});
describe("placeholder suppression (per-span)", () => {
test("AWS docs EXAMPLE key not flagged", () => {
expect(ids("AKIAIOSFODNN7EXAMPLE")).not.toContain("aws.access_key");
});
test("your_ prefix not flagged", () => {
expect(isPlaceholderSpan("your_api_key")).toBe(true);
});
test("a real secret on a line that ALSO contains EXAMPLE still flags", () => {
// line-based suppression would wrongly skip this; per-span must catch it.
expect(ids("# EXAMPLE usage\nkey AKIA1234567890ABCDEF")).toContain("aws.access_key");
});
});
describe("no visibility-based tier promotion (TENSION-2-followup)", () => {
test("email stays MEDIUM on both private and public", () => {
const priv = scan("x@corp.io", { repoVisibility: "private" }).findings[0];
const pub = scan("x@corp.io", { repoVisibility: "public" }).findings[0];
expect(priv.tier).toBe("MEDIUM");
expect(pub.tier).toBe("MEDIUM");
expect(pub.severity).toBe("MEDIUM"); // NOT promoted to HIGH
expect(pub.repoVisibility).toBe("public"); // recorded for sterner wording
});
test("demoted credential patterns stay MEDIUM on public", () => {
const pub = scan("pk_live_" + "a".repeat(30), { repoVisibility: "public" }).findings[0];
expect(pub.severity).toBe("MEDIUM");
});
test("unknown visibility treated as public for wording, still no promotion", () => {
const r = scan("x@corp.io", { repoVisibility: "unknown" });
expect(r.findings[0].severity).toBe("MEDIUM");
});
});
describe("tool-attributed fence WARN-degrade (TENSION-3)", () => {
test("placeholder-shaped credential in tool fence → WARN", () => {
const text = "```codex-review\nfound your_aws_key AKIAIOSFODNN7EXAMPLE in code\n```";
const r = scan(text, { repoVisibility: "private" });
// the EXAMPLE key is suppressed as placeholder; verify a non-credential note doesn't block
expect(r.counts.HIGH).toBe(0);
});
test("live-format credential in tool fence STILL blocks", () => {
const text = "```codex-review\nleaked AKIA1234567890ABCDEF here\n```";
const r = scan(text, { repoVisibility: "private" });
expect(r.counts.HIGH).toBe(1); // not degraded — live format
});
test("AKIA outside any fence blocks", () => {
expect(exitCodeFor(scan("AKIA1234567890ABCDEF", {}))).toBe(3);
});
});
describe("normalization", () => {
test("zero-width chars inside a key are stripped before matching", () => {
const zwsp = "";
const broken = "AKIA1234567890" + zwsp + "ABCDEF";
expect(ids(broken)).toContain("aws.access_key");
});
test("HTML entity decode", () => {
const { normalized } = normalizeWithMap("a &amp; b");
expect(normalized).toBe("a & b");
});
test("offset map points back into original", () => {
const input = "xyz";
const { normalized, map } = normalizeWithMap(input);
expect(normalized).toBe("xyz");
// 'z' is at normalized index 2, original index 3
expect(map[2]).toBe(3);
});
});
describe("oversize fails CLOSED", () => {
test("input over the byte cap returns a single blocking HIGH finding", () => {
const big = "a".repeat(2000);
const r = scan(big, { maxBytes: 1000 });
expect(r.oversize).toBe(true);
expect(r.counts.HIGH).toBe(1);
expect(r.findings[0].id).toBe("engine.input_too_large");
expect(exitCodeFor(r)).toBe(3);
});
// #1824: a malformed --max-bytes used to reach the engine as NaN. `byteLen >
// NaN` is always false, silently disabling the fail-closed guard. The engine
// guardrail must fall back to the default cap for any non-finite / <= 0 value.
test("NaN maxBytes falls back to the default cap (does NOT disable the guard)", () => {
const big = "a".repeat(2 * 1024 * 1024); // > 1 MiB default cap
const r = scan(big, { maxBytes: NaN });
expect(r.oversize).toBe(true);
expect(r.findings[0].id).toBe("engine.input_too_large");
expect(exitCodeFor(r)).toBe(3);
});
test("negative / zero maxBytes falls back to the default cap", () => {
// negative would make `byteLen > -5` always true (block everything);
// the guardrail normalizes it to the default instead.
const small = "ok";
expect(scan(small, { maxBytes: -5 }).oversize).toBeFalsy();
expect(scan(small, { maxBytes: 0 }).oversize).toBeFalsy();
const big = "a".repeat(2 * 1024 * 1024);
expect(scan(big, { maxBytes: -5 }).oversize).toBe(true);
});
});
describe("validators", () => {
test("luhn", () => {
expect(luhnValid("4111111111111111")).toBe(true);
expect(luhnValid("4111111111111112")).toBe(false);
});
test("entropy", () => {
expect(shannonEntropy("aaaaaaaa")).toBeLessThan(1);
expect(shannonEntropy("8Fk2pQ9vXz4wL7mN")).toBeGreaterThan(3);
});
test("isPublicIPv4", () => {
expect(isPublicIPv4("8.8.8.8")).toBe(true);
expect(isPublicIPv4("10.1.2.3")).toBe(false);
expect(isPublicIPv4("172.16.5.5")).toBe(false);
expect(isPublicIPv4("999.1.1.1")).toBe(false);
});
});
describe("masking + purity", () => {
test("preview never leaks more than 4 leading chars", () => {
expect(maskPreview("AKIA1234567890ABCDEF")).toBe("AKIA********…");
expect(maskPreview("abc")).toBe("abc");
});
test("scan is pure — same input twice yields identical findings", () => {
const a = scan("AKIA1234567890ABCDEF x@corp.io", { repoVisibility: "public" });
const b = scan("AKIA1234567890ABCDEF x@corp.io", { repoVisibility: "public" });
expect(a).toEqual(b);
});
});
describe("redactFindingSpans — machine-egress masking (#1947)", () => {
test("clean input passes through unchanged", () => {
const text = "push failed: remote rejected the branch";
expect(redactFindingSpans(text, { repoVisibility: "private" })).toBe(text);
});
test("a single finding's span becomes <REDACTED-{id}>, context survives", () => {
const token = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
const out = redactFindingSpans(`auth ${token} rejected`, { repoVisibility: "private" });
expect(out).toBe("auth <REDACTED-github.pat> rejected");
});
test("multiple findings are all replaced (right-to-left splice keeps offsets valid)", () => {
const pat = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
const aws = "AKIA1234567890ABCDEF";
const out = redactFindingSpans(`first ${aws} then ${pat} end`, {
repoVisibility: "private",
});
expect(out).toBe("first <REDACTED-aws.access_key> then <REDACTED-github.pat> end");
});
test("fails closed (null) when a span cannot be relocated — never raw passthrough", () => {
// env.kv's span (the value) starts well past the regex match start (the
// var name), so locateSpan's rewind-2 re-exec misses it. The contract is
// null → caller drops the whole payload. The one thing that must never
// happen is the secret surviving in the output.
const secret = "8Fk2pQ9vXz4wL7mN3rT6yB1cD5eG0hJq";
const out = redactFindingSpans(`API_KEY=${secret}`, { repoVisibility: "private" });
if (out !== null) {
// If locateSpan ever learns to find context-prefixed spans, masking
// must actually mask.
expect(out).not.toContain(secret);
} else {
expect(out).toBeNull();
}
});
test("multiline input redacts a finding past the first line (locateSpan line/col path)", () => {
const token = "ghp_" + "1234567890abcdefghijklmnopqrstuvwxyz";
const out = redactFindingSpans(`line one\nline two has ${token}\nline three`, {
repoVisibility: "private",
});
expect(out).toBe("line one\nline two has <REDACTED-github.pat>\nline three");
});
// Pre-landing review CRITICAL: pem.private_key and gcp.service_account
// capture only the HEADER, not the key material — a span splice would
// redact the marker and forward the key body. Marker-only patterns must
// drop the whole payload.
test("PEM private key → null (header-only span must not forward the key body)", () => {
const msg =
"deploy failed: -----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASC\n-----END PRIVATE KEY-----";
expect(redactFindingSpans(msg, { repoVisibility: "private" })).toBeNull();
});
test("GCP service-account JSON → null (key body follows the captured marker)", () => {
const msg =
'config dump: {"private_key_id": "abc123", "private_key": "-----BEGIN PRIVATE KEY-----\\nMIIEvQIBADANBg..."}';
expect(redactFindingSpans(msg, { repoVisibility: "private" })).toBeNull();
});
// Pre-landing review: overlapping spans (a Bearer token that is also a
// JWT) must coalesce — independent splices apply stale offsets and can
// leave trailing secret bytes or mangled markers.
test("overlapping spans (Bearer JWT fires auth.bearer + jwt) never leak and produce clean markers", () => {
const jwt = "eyJ" + "a".repeat(20) + ".eyJ" + "b".repeat(20) + "." + "c".repeat(20);
const out = redactFindingSpans(`Authorization: Bearer ${jwt}`, { repoVisibility: "private" });
expect(out).not.toBeNull();
expect(out!).not.toContain("eyJ");
expect(out!).not.toContain("aaaa");
expect(out!).not.toContain("cccc");
// One coalesced, well-formed marker — no truncated fragments.
expect(out!).toMatch(/^Authorization: Bearer <REDACTED-[a-z._+]+>$/);
});
});
describe("taxonomy integrity", () => {
test("every pattern has a unique id", () => {
const set = new Set(PATTERNS.map((p) => p.id));
expect(set.size).toBe(PATTERNS.length);
});
test("autoRedactable patterns have a redactToken", () => {
for (const p of PATTERNS) {
if (p.autoRedactable) expect(p.redactToken).toBeTruthy();
}
});
});