diff --git a/TODOS.md b/TODOS.md index 88e70ee8..27e7896e 100644 --- a/TODOS.md +++ b/TODOS.md @@ -4,17 +4,16 @@ ### Pre-existing test failures surfaced during v1.12.0.0 ship -**What:** Three test failures on bare main that have been shipping as-is for multiple versions: +**What:** Two remaining test failures on bare main that have been shipping as-is for multiple versions. (The bearer-json secret-scan regression flagged here originally was a real leak path and has been fixed in this PR — see Completed section below.) 1. `gstack-config gbrain keys > GSTACK_HOME overrides real config dir` (`test/brain-sync.test.ts:104`) — the GSTACK_HOME env override leaks into the real `~/.gstack/config.yaml`. Test asserts real config does NOT contain `gbrain_sync_mode: full` but it does. Either the test environment isn't isolated correctly or `bin/gstack-config` is writing to both locations. -2. `gstack-brain-sync secret scan > blocks bearer-json` (`test/brain-sync.test.ts:303`) — the bearer-token secret scanner expects `status: "blocked"` but sees `"ok"`. The JSON-format bearer-token pattern isn't being caught. Check the regex in the Python scanner inside `bin/gstack-brain-sync` vs the test fixture content. -3. `Opus 4.7 overlay — pacing directive > keeps Fan out / Effort-match / Literal interpretation nudges` (`test/model-overlay-opus-4-7.test.ts:87`) — v1.10.1.0 (#1166) removed the "Fan out explicitly" nudge from the overlay but the assertion was never updated. Either the nudge should come back (intentional removal reverted) or the test should be updated to match the new expected content. +2. `Opus 4.7 overlay — pacing directive > keeps Fan out / Effort-match / Literal interpretation nudges` (`test/model-overlay-opus-4-7.test.ts:87`) — v1.10.1.0 (#1166) removed the "Fan out explicitly" nudge from the overlay but the assertion was never updated. Either the nudge should come back (intentional removal reverted) or the test should be updated to match the new expected content. -**Why:** All three have been green-washing through recent `/ship` runs via "pre-existing test failures skipped: ." Each one either signals a real isolation bug (#1), a security regression (#2), or a stale assertion (#3). +**Why:** Both have been green-washing through recent `/ship` runs via "pre-existing test failures skipped: ." #1 signals a real config isolation bug; #2 is a stale assertion since the overlay intentionally removed that nudge. -**Priority:** P0 +**Priority:** P0 (both) -**Effort:** S-M each. #1 likely a test harness fix. #2 might be a real scanner regression — investigate before shipping more memory-sync changes. #3 is a one-line test update OR a revert of #1166. +**Effort:** S each. #1 likely a test harness fix in `test/brain-sync.test.ts`'s setup hook. #2 is a one-line test update OR a revert of #1166. --- @@ -1320,6 +1319,17 @@ Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into pr ## Completed +### Bearer-token secret-scan regression fixed + E2E coverage added for privacy gate + gh auto-create (v1.12.0.0) + +- **Fixed the `bearer-token-json` regression in `bin/gstack-brain-sync`** — the value charset `[A-Za-z0-9_./+=-]{16,}` didn't permit spaces, so auth headers with the standard `Bearer ` form (literal space after the scheme name) slipped past the scanner. Added an optional `(Bearer |Basic |Token )?` prefix to the pattern. Validated against 5 positive cases (including the regression fixture) + 3 negative cases (short tokens, non-secret keys, random JSON). The 7-pattern secret scanner now passes all fixtures including bearer-json. +- **Added `test/gstack-brain-init-gh-mock.test.ts`** — 8 tests exercising the `gh` CLI auto-create path that previously had zero coverage. Stubs `gh` on PATH to record every call, asserts `gh repo create --private --description "..." --source ` fires with the computed `gstack-brain-` default name. Covers: happy path, fall-through-to-`gh repo view` when create hits already-exists, user-provided-URL-bypasses-gh, gh-not-on-path prompts for URL, gh-not-authed prompts for URL, idempotent `--remote` re-runs, conflicting-remote rejection. +- **Added `test/skill-e2e-brain-privacy-gate.test.ts`** — periodic-tier E2E (~$0.30-$0.50/run). Stages a fake `gbrain` on PATH + `gbrain_sync_mode_prompted=false` in config, runs a real skill via `runAgentSdkTest`, intercepts tool-use via `canUseTool`, and asserts the preamble fires the 3-option privacy AskUserQuestion with canonical prose ("publish session memory" / "artifact" / "decline"). Second test asserts the gate is silent when `prompted=true` (idempotency-within-session). +- **Registered `brain-privacy-gate` in `test/helpers/touchfiles.ts`** (periodic tier) with dependency tracking on `scripts/resolvers/preamble/generate-brain-sync-block.ts`, `bin/gstack-brain-sync`, `bin/gstack-brain-init`, `bin/gstack-config`, and the Agent SDK runner. Diff-based selection will re-run the E2E whenever any of those change. + +**Completed:** v1.12.0.0 (2026-04-24) + +--- + ### Overlay efficacy harness + Opus 4.7 fanout nudge removal (v1.10.1.0) - Built `test/skill-e2e-overlay-harness.test.ts`, a parametric periodic-tier eval that drives `@anthropic-ai/claude-agent-sdk` and measures first-turn fanout rate (overlay-ON vs overlay-OFF) across registered fixtures - Measured the original "Fan out explicitly" overlay nudge: baseline Opus 4.7 = 70% first-turn fanout on toy prompt, with our nudge = 10%, with Anthropic's own canonical `` text = 0%