diff --git a/CHANGELOG.md b/CHANGELOG.md index 139ca8ac5..8fc55131a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,49 @@ # Changelog +## [1.53.0.0] - 2026-05-29 + +## **Secrets, PII, and legal landmines get caught before they reach a public sink. One redaction engine now guards /spec, /ship, /cso, and the /document-* skills.** + +`/spec` used to scan for seven secret patterns and only blocked the codex hand-off. Everything after that — the GitHub issue it filed, the local archive — went out unscanned. So you could pull an AWS key out of the draft, re-run, and still publish a customer's email to a world-readable issue. That gap is closed. A single shared engine (`lib/redact-patterns.ts` + `lib/redact-engine.ts`, driven by the new `gstack-redact` CLI) now scans the exact bytes that will be sent, at every sink: the codex dispatch, the issue body, the archive write, the PR body and title, and generated docs before they commit. HIGH-confidence credentials block. PII and legal/damaging content (a named person tied to "fired", a customer tied to "churn", NDA markers) prompt you per finding, with one-keystroke auto-redact for emails, phones, SSNs, and cards. Public repos get a sterner bar than private ones. + +It is a guardrail, not a vault. `git push --no-verify`, a direct `gh issue create`, and `GSTACK_REDACT_PREPUSH=skip` all still get through. It catches accidents and carelessness, which is where real leaks come from. + +### The numbers that matter + +From the shipped engine and its test suite (`bun test test/redact-*.test.ts` and the per-skill wiring tests): + +| Metric | Before (v1.52) | After (v1.53) | Δ | +|--------|----------------|---------------|---| +| Redaction patterns | 7 (secrets only) | 33 (secrets + PII + legal + internal) | +26 | +| Tiers | 1 (block) | 3 (block / confirm / FYI) | +2 | +| Enforcement sinks in /spec | 1 (codex only) | 3 (codex, issue, archive) | +2 | +| Skills guarded | 1 (/spec) | 5 (/spec, /ship, /cso, /document-release, /document-generate) | +4 | +| Redaction tests | ~5 string checks | 159 behavior tests | +154 | + +Tier split of the 33 patterns: 17 HIGH (genuinely-secret credentials), 14 MEDIUM (PII, legal, internal-leak, plus high-FP credential shapes), 2 LOW. Calibration is the point: Stripe publishable keys, Google `AIza` keys, JWTs, and env-style `*_KEY=` sit at MEDIUM, not HIGH, because a gate that cries wolf gets muted. + +### What this means for you + +When you `/spec` or `/ship`, you no longer have to remember that the issue body is public. A real credential stops the operation cold and tells you to rotate it. An email or a sentence naming a coworker surfaces as a question, with auto-redact one keystroke away. Turn on the optional pre-push hook (`gstack-config set redact_prepush_hook true`) to catch the classic `.env`-into-the-diff push too. Nothing new to learn: it runs inside the skills you already use. + +### Itemized changes + +#### Added +- **Shared redaction engine.** `lib/redact-patterns.ts` (33-pattern, 3-tier taxonomy — the single source of truth) and `lib/redact-engine.ts` (pure `scan()` + `applyRedactions()` with Unicode normalization, ReDoS-safe size cap, Luhn/entropy/RFC1918 validators, safe-masked previews). +- **`gstack-redact` CLI** — scan stdin or a file, JSON or human output, exit 0/2/3 to gate skills, `--auto-redact` for the PII one-keystroke path, `--repo-visibility`, `--allowlist`, `--self-email`. +- **Opt-in pre-push hook** (`gstack-redact-prepush` + `gstack-redact install-prepush-hook`) — blocks a credential in the pushed diff (public and private), correct `remote..local` diff direction with new-branch/force-push/delete handling, chains any existing hook, `GSTACK_REDACT_PREPUSH=skip` escape valve. +- **`/spec` Phase 4.5a semantic review** — an in-conversation pass (no third party) for named-criticism, customer complaints, unannounced strategy, NDA material, and codename bleed, with a content-free audit trail at `~/.gstack/security/semantic-reviews.jsonl`. +- **Config keys** `redact_repo_visibility` (local-only override for repos `gh`/`glab` can't read) and `redact_prepush_hook`. + +#### Changed +- **`/spec`, `/ship`, `/document-release`, `/document-generate`** scan at every external sink, on the exact bytes sent (temp-file scan-at-sink, no scan-then-re-render gap). `/ship` wraps Codex/Greptile output in tool-attributed fences so the example credentials those tools quote degrade to a non-blocking warning instead of failing the PR. +- **`/cso`** shares the same canonical taxonomy via `lib/redact-patterns.ts` for its secrets archaeology. + +#### For contributors +- Skill docs for the redaction surface are generated from `scripts/resolvers/redact-doc.ts` (`{{REDACT_TAXONOMY_TABLE}}`, `{{REDACT_INVOCATION_BLOCK:}}`), so the five skills never drift from the engine. +- 12 new test files, 159 redaction assertions, plus a periodic-tier semantic-pass eval (`test/redact-semantic-pass.eval.ts`). +- Known pre-existing: the legacy `test/parity-suite.test.ts` (v1.44.1 baseline) reports 5 planning-skill size regressions inherited from the brain-aware-planning releases (v1.49–v1.52); they are unrelated to this branch and the active v1.47 size-budget gate passes. Tracked in TODOS.md to rebaseline. + ## [1.52.2.0] - 2026-05-29 ## **Emoji render in make-pdf PDFs on every platform. Linux stops printing tofu boxes, and setup installs the font for you.** diff --git a/TODOS.md b/TODOS.md index 7952e1c26..d3c32bc72 100644 --- a/TODOS.md +++ b/TODOS.md @@ -1,5 +1,29 @@ # TODOS +## Test infrastructure + +### P0: Rebaseline parity-suite (v1.44.1) — stale, 5 pre-existing failures + +**What:** `test/parity-suite.test.ts` checks every skill's SKILL.md size against +the frozen `test/fixtures/parity-baseline-v1.44.1.json`. Five planning skills now +exceed the 1.05x ceiling: `plan-ceo-review` (1.052), `plan-eng-review` (1.062), +`plan-design-review` (1.068), `investigate` (1.053), `office-hours` (1.065). + +**Why:** These grew during the brain-aware-planning releases (v1.49–v1.52) which +added the `BRAIN_PREFLIGHT`/`BRAIN_CACHE_REFRESH`/`BRAIN_WRITE_BACK` resolvers to +those skills. The v1.44.1 baseline was never regenerated, so it's four releases +stale. The failures are pre-existing on `origin/main` (proven: they fail with the +redaction branch absent). The active size gate (`skill-size-budget`, v1.47 baseline) +passes, and parity-suite is not in CI's `test:gate`, so nothing is blocked — but the +local `bun test` shows red until rebaselined. + +**How to start:** Either regenerate the fixture to a current baseline +(`bun run scripts/capture-baseline.ts ` and point the test at it), or bump the +per-skill ratio for the planning skills. Decide whether v1.44.1 should be retired in +favor of the v1.47 baseline the size-budget test already uses. + +**Depends on:** nothing. Standalone. + ## gbrowser memory follow-ups (filed via /plan-eng-review + /codex on the v1.49 leak-fix PR) These four items came out of the memory-leak investigation that shipped diff --git a/VERSION b/VERSION index d7f9d8f6c..b8c5f21a9 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.52.2.0 +1.53.0.0 diff --git a/package.json b/package.json index a08f31dc7..75d05e770 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.52.2.0", + "version": "1.53.0.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module",