Commit Graph

4 Commits

Author SHA1 Message Date
Garry Tan 2fa6312190 Merge remote-tracking branch 'origin/main' into garrytan/triage-open-issues 2026-06-07 22:59:06 -07:00
Garry Tan b89b911a4a fix(ship): adversarial subagent no longer trips usage-policy denial on own security fixtures (#1899)
The Claude adversarial subagent in /review and /ship was told to "think like an
attacker" over the full diff. When the diff includes the repo's own security
regression fixtures (real attack payloads, by design), reasoning adversarially
over that material triggered Anthropic's real-time usage-policy safeguards and
the subagent call was denied — blocking the review.

Fix at the prompt's source of truth (scripts/resolvers/review.ts {{ADVERSARIAL_STEP}}):
- Authorized-defensive-testing framing: declares this is the maintainer's own repo
  and that attack-pattern strings inside test/fixture paths are the project's own
  regression corpus to analyze, not material to expand on.
- Fixture summary-mode diff: full content for non-fixture source, --stat/--name-status
  for test/fixture files, so raw exploit bytes aren't fed into adversarial reasoning.
  The subagent must state fixtures were reviewed in summary mode (no silent coverage cut).

Reported by @bmajewski. Regenerated review/SKILL.md + ship/sections/adversarial.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 22:46:49 -07:00
Garry Tan d8c91c6267 v1.57.3.0 fix(ship): always-loaded PR-title-version rule + fork-PR title-sync backstop (#1909)
* fix(ship): restore always-loaded PR-title-version invariant to skeleton

The v1.54.0.0 carve moved the 'PR title MUST start with v$NEW_VERSION' rule
out of the always-loaded ship skeleton and entirely into the lazily-loaded
pr-body.md section. The agent only set the version prefix if it happened to
read that section before creating the PR, so PRs landed with bare titles.

Restore a one-line invariant (+ helper reference) to ship/SKILL.md.tmpl right
before the {{SECTION:pr-body}} pointer, mirroring the AUQ always-loaded
precedent. Full procedure stays sectioned. Regenerated all hosts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): guard PR-title-version rule + pull_request_target safety

Two free gate tests so a future carve or workflow refactor can't silently
regress:

- ship-pr-title-version-always-loaded: asserts the invariant lives in the
  always-loaded ship/SKILL.md skeleton (not only sections/), and that the
  skeleton+sections union keeps BOTH the create and the existing-PR update
  title paths. Modeled on test/auq-format-always-loaded.test.ts.
- pr-title-sync-workflow-safety: static tripwire that fails CI if
  pr-title-sync.yml checks out PR-head code or inlines an attacker-controlled
  ${{ github.event.pull_request.* }} field inside a run: block (the two
  pull_request_target footguns actionlint cannot catch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ci): pr-title-sync covers fork PRs via hardened pull_request_target

Under plain pull_request the GITHUB_TOKEN is read-only on fork PRs, so the
title-sync backstop could never edit a fork/agent PR title. Switch to
pull_request_target (write token in base context) and make it safe:

- Check out the base repo only (no ref:) — execute trusted infra, never
  fork-head code.
- All attacker-controlled PR fields (title, head repo, head sha) pass via
  env: and are referenced as shell-quoted "$VAR", never inlined into run:.
- Read the PR-head VERSION as data (raw media type) from the head repo at the
  head sha; guard the assignment under set -e.
- Same-repo read failure fails loudly; fork miss warns and skips (the backstop
  stays green without going silently optional).
- Never echo the raw fork title (Actions parses ::workflow-command:: from stdout).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ship): expand binDir path in pr-body Linked Spec block

ship/sections/pr-body.md.tmpl:98-99 used ${ctx.paths.binDir}, but the
gen-skill-docs generator only resolves {{TOKEN}} syntax in .tmpl files — the
${...} JS-template-literal form is substituted only inside .ts resolver files.
So the token passed through literally into the generated pr-body.md, leaving the
agent with an unexpandable ${ctx.paths.binDir}/gstack-paths command in the
Linked Spec auto-detect block. Use the hardcoded helper path, consistent with
every other path reference in this section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(test): fold ship PR-title skeleton guard into carve-guard registry

main shipped a generalized carve-guard system (PR #1907) that is now the single
source of truth for carved-skill skeleton invariants. Register the PR-title rule
there instead of a standalone test: ship's mustStayInSkeleton asserts v$NEW_VERSION
+ the rewrite helper stay always-loaded, and mustMoveToSection asserts both the
create and update PR paths stay carved into pr-body.md (present in the union, out of
the skeleton). Delete the standalone ship-pr-title-version-always-loaded test it
replaces. The CI-workflow safety tripwire stays standalone (not a carve concern).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.57.3.0)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 22:04:18 -07:00
Garry Tan 46c1fae7f1 v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded) (#1806)
* feat(test): transcript-section-logger + ship-action fingerprint (T10)

Pure-analysis module over a SkillTestResult/NDJSON transcript:
- extractSectionReads(): which sections/*.md a run opened (post-carve check)
- extractShipActions(): observable action fingerprint (merge/test/bump/
  changelog/commit/push/pr) that works on the MONOLITH too, so a baseline
  captured before the carve can detect a sectioned-ship regression
- baseline read/write + compareShipActions() for baseline-first dogf(T10)

Baseline-first answers the Codex outside-voice critique that a logger in the
same PR as the carve is post-failure telemetry without a pre-carve reference.

11 unit tests, all green. Paid monolith baseline capture runs separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(pipeline): section discovery + generation machinery (T9)

- discover-skills.ts: discoverSectionTemplates() scans <skill>/sections/*.md.tmpl
- gen-skill-docs.ts: extract resolvePlaceholders + applyHostRewrites + buildContext
  as shared helpers (processTemplate and the new processSectionTemplate both call
  them, so a sanitization/rewrite fix can't miss sections) [C1]
- processSectionTemplate: body-fragment generation (no frontmatter/catalog/voice),
  parent-skill TemplateContext (skillName pinned to parent, not 'sections', so
  appliesTo gating + tier behave identically), per-host output routing
- --host all now fails the build on ANY host failure, not just claude, so a stale
  external-host output can't slip the freshness gate [Codex outside-voice #9]

Inert until a skill is carved (no sections/ dirs exist yet). Refactor is
output-neutral: gen:skill-docs --dry-run --host all reports 0 STALE.

5 discovery unit tests + 389 gen-skill-docs tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(setup): install sections/ for cherry-pick targets (claude + kiro) (T9)

Two install targets cherry-pick SKILL.md and would leave a carved skill's
sections/ behind, 404ing a runtime 'Read sections/<name>.md':
- link_claude_skill_dirs: link the sections/ subdir via _link_or_copy (windows
  gets a fresh copy on every ./setup)
- kiro per-skill loop: sed-rewrite + copy each sections/* so paths resolve under
  ~/.kiro, not ~/.codex/~/.claude

codex/factory/opencode link the whole generated dir, so sections ride free.
Addresses Codex outside-voice #4/#6 (runtime pathing landmine). Inert until a
skill is carved. Static-tripwire test + windows-fallback invariant green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ship): gstack-version-bump CLI — tested idempotency classify + write (T9)

Hybrid CLI extraction (CM1): the deterministic core of ship Step 12 becomes a
tested CLI instead of bash prose the agent re-derives each run.
- classify: FRESH/ALREADY_BUMPED/DRIFT_STALE_PKG/DRIFT_UNEXPECTED from VERSION
  vs origin/<base>:VERSION vs package.json.version (pure reader)
- write: validated dual-write to VERSION + package.json (FRESH bump)
- repair: DRIFT_STALE_PKG sync, no re-bump
Bump-LEVEL choice + queue collision stay agent judgment; slot pick stays
bin/gstack-next-version. This removes the re-bump-a-shipped-branch footgun from
skippable prose into code that can't be skipped or misread.

15 tests (exhaustive state matrix + write/repair fs + real-git classify).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(parity): sectioned-skill parity capability — guards the carve (T9)

Carved skills (skeleton + sections/*.md) need parity checks that see relocated
content, or moving a phrase into a section reads as 'lost':
- readSkillForParity(): union skeleton + all sections/*.md
- checkSkillParity sectioned mode: content checks against the union; minBytes/
  maxSizeRatio against union bytes (total behavior preserved); maxSkeletonBytes
  asserts the always-loaded skeleton actually shrank. Lowering minBytes to fit a
  small skeleton would otherwise make the size floor toothless [Codex #12].

Built + tested BEFORE the carve so ship's invariant can flip to sectioned in the
same commit it lands. Monolith path byte-identical (verified: pre-existing
investigate 1.053 ratio drift fails the same with this change stashed).

7 sectioned-parity tests + existing parity tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(ship): carve into skeleton + on-demand sections (Claude) (T9)

ship/SKILL.md drops 167KB → 68.7KB (~59% of the always-loaded skill) by moving
8 prose-heavy steps into ship/sections/*.md, read on demand:
tests, test-coverage, plan-completion, review-army, greptile, adversarial,
changelog, pr-body. Step 12's version logic now calls the tested
gstack-version-bump CLI instead of inline bash.

Claude-first (S2): {{SECTION:id}} emits a STOP-Read pointer on Claude (skeleton +
generated section files) and INLINES the content on every other host, so external
hosts keep the full monolith — verified factory at 162KB with no sections dir.
{{SECTION_INDEX:ship}} renders the situation→section table from the PASSIVE
manifest (CM2 / v2_PLAN.md:663); required-reads live only in test fixtures.
Multi-pass resolve expands inlined sections' own resolvers.

Parity: ship invariant flipped to sectioned (union content checks + maxSkeletonBytes
asserts the shrink). Carve-fallout fixed across gen-skill-docs/skill-validation/
golden/plan-completion/#1539/size-budget tests via skeleton+sections union reads.
Free suite green except the pre-existing investigate parity drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): manifest-consistency + context-parity + requiredReads helper (T9)

Free deterministic guards for the carve:
- required-reads.ts + unit test: assertRequiredReads(run, requiredFiles) — the
  mechanical layer-5 check that the agent Read the sections its situation needs
  (required set comes from the fixture, not the passive manifest)
- section-manifest-consistency: 3-tier orphan classification (generated orphan +
  hand-edited generated file → FAIL; manifest orphan → WARN per v2_PLAN.md) and
  pins the PASSIVE-manifest contract (no applies_when/required_for)
- template-context-parity: generated sections have zero unresolved placeholders
  and gated resolvers (ADVERSARIAL_STEP/CONFIDENCE_CALIBRATION/CHANGELOG_WORKFLOW)
  rendered — proving sections resolve with the parent skillName, not 'sections'

16 tests, all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): section-loading E2E + idempotency CLI detection (T9)

- skill-e2e-ship-section-loading.test.ts (new, periodic): runs real /ship in plan
  mode against a fresh version-changing fixture and asserts the agent Read the
  required sections (review-army + changelog). Runs against the INSTALLED skill
  (~/.claude/skills/gstack/ship), not repo paths, so install-layout 404s surface
  [Codex outside-voice #5]. Layer-5 mechanical guard against silent section-skip.
- skill-e2e-ship-idempotency.test.ts: detection updated for the carve — Step 12
  now runs gstack-version-bump classify (JSON "state":"ALREADY_BUMPED") instead
  of the inline bash echo (STATE: ALREADY_BUMPED). Accept both; add a
  gstack-version-bump-write re-bump regression signal.
- touchfiles: register ship-section-loading (periodic) + extend idempotency deps
  with bin/gstack-version-bump + scripts/resolvers/sections.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(ship): union-read redaction wiring test for the carve (T9)

main's PR-body redaction-at-sink lives in sections/pr-body.md.tmpl after the
carve, not the skeleton template. Read skeleton + section templates union so the
redaction-wiring assertions follow the relocated content. 9/9 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* v1.54.0.0 feat: carve /ship into skeleton + on-demand sections (-59% always-loaded)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 12:09:10 -07:00