4 Commits

Author SHA1 Message Date
Garry Tan b3eaffce07 feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030)
* refactor: renumber /ship steps to clean integers (1-20)

Replaces fractional step numbers (1.5, 2.5, 3.25, 3.4, 3.45, 3.47, 3.48,
3.5, 3.55, 3.56, 3.57, 3.75, 3.8, 5.5, 6.5, 8.5, 8.75) with clean
integers 1 through 20, plus allowed resolver sub-steps 8.1, 8.2,
9.1, 9.2, 9.3. Fractional numbering signaled "optional appendix" and
contributed to /ship's habit of skipping late-stage steps.

Affects:
- ship/SKILL.md.tmpl (all headings + ~30 cross-references)
- scripts/resolvers/review.ts (ship-side 3.47/3.48/3.57/3.8 conditionals)
- scripts/resolvers/review-army.ts (ship-side 3.55/3.56 conditionals)
- scripts/resolvers/testing.ts (ship-side 2.5/3.4 references, 5 sites)
- scripts/resolvers/utility.ts (CHANGELOG heading gets Step 13 prefix)
- test/gen-skill-docs.test.ts (5 step-number assertions updated)
- test/skill-validation.test.ts (3 step-number assertions updated)

/review step numbering (1.5, 2.5, 4.5, 5.5-5.8) intentionally unchanged —
only the ship-side of each isShip conditional was updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: subagent isolation for /ship's 4 context-heaviest sub-workflows

Fights context rot. By late /ship, the parent context is bloated with
500-1,750 lines of intermediate tool output from tests, coverage audits,
reviews, adversarial checks, and PR body construction. The model is
at its least intelligent when it reaches doc-sync — which is why
/document-release was being skipped ~80% of the time.

Applies subagent dispatch (proven pattern from Review Army at Step 9.1
and Adversarial at Step 11) to four sub-workflows where the parent
only needs the conclusion, not the intermediate output:

- Step 7 (Test Coverage Audit) — subagent returns coverage_pct, gaps,
  diagram, tests_added
- Step 8 (Plan Completion Audit) — subagent returns total_items, done,
  changed, deferred, summary
- Step 10 (Greptile Triage) — subagent fetches + classifies, parent
  handles user interaction and commits fixes (AskUserQuestion + Edit
  can't run in subagents)
- Step 18 (Documentation Sync) — subagent invokes full /document-release
  skill in fresh context; parent embeds documentation_section in PR body

Sequencing fix for Step 18: runs AFTER Step 17 (Push) and BEFORE Step 19
(Create PR). The PR is created once from final HEAD with the
## Documentation section baked into the initial body — no create-then-
re-edit dance, no race conditions with document-release's own PR body
editor.

Adds "You are NOT done" guardrail after Step 17 (Push) to break the
natural stopping point that currently causes doc-release skips.

Each subagent falls back to inline execution if it fails or returns
invalid JSON. /ship never blocks on subagent failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: regression guard for /ship step numbering

Three regression guards in skill-validation.test.ts to prevent future
drift back to fractional step numbering:

1. ship/SKILL.md.tmpl contains no fractional step numbers except the
   allowed resolver sub-steps (8.1, 8.2, 9.1, 9.2, 9.3). A contributor
   adding "Step 3.75" next month will fail this test with a clear error.

2. ship/SKILL.md main headings use clean integer step numbers. If a
   renumber accidentally leaves a decimal heading, this catches it.

3. review/SKILL.md step numbers unchanged — regression guard for the
   resolver conditionals in review.ts/review-army.ts. If a future edit
   accidentally touches the review-side of an isShip ternary, /review's
   fractional numbering (1.5, 4.5, 5.7) would vanish. This test catches
   that cross-contamination.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync ship step references after renumber

CLAUDE.md: "At /ship time (Step 5)" → "(Step 13)" — CHANGELOG is now
  explicitly Step 13 after the renumber (was implicit between old
  Step 4 and Step 5.5).
TODOS.md: "Step 3.4 coverage audit" → "Step 7" — references the open
  TODO for auto-upgrading ★-rated tests, which hooks into the coverage
  audit step.

Both are historical references to ship's step numbering that became
stale when clean integer renumbering landed in 566d42c2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: update golden ship skill baselines after renumber + subagent refactor

The golden fixtures at test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md
regression-test that generated ship/SKILL.md output matches a committed baseline.
After renumbering steps to clean integers and converting 4 sub-workflows to
subagent dispatches, the generated output changed substantially — refresh the
baselines to reflect the new expected output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.18.1.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: gitignore Claude Code harness runtime artifacts

.claude/scheduled_tasks.lock appears when ScheduleWakeup fires. It's a
runtime lock file owned by the Claude Code harness, not project source.
Add .claude/*.lock too so future harness artifacts in that directory
don't need their own gitignore entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 23:14:03 -07:00
Garry Tan 422f172fbb feat: ship re-run executes all verification checks (v0.15.10.0) (#833)
* feat: review army idempotency + cross-review dedup resolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ship re-run executes all checks, adds review army + dedup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: regression guards for ship specialist dispatch + dedup + idempotency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.10.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:43:13 -07:00
Garry Tan 9ca8f1d7a9 feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760)
* feat: add test_stub optional field to specialist finding schema

All specialist prompts now document test_stub as an optional output field,
enabling specialists to suggest test code alongside findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: adaptive gating + test framework detection for review army

Adds gstack-specialist-stats binary for tracking specialist hit rates.
Resolver now detects test framework for test_stub generation, applies
adaptive gating to skip silent specialists, and compiles per-specialist
stats for the review-log entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cross-review finding dedup + test stub override + enriched review-log

Step 5.0 suppresses findings previously skipped by the user when the
relevant code hasn't changed. Test stub findings force ASK classification
so users approve test creation. Review-log now includes quality_score,
per-specialist stats, and per-finding action records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: bash operator precedence in test framework detection

[ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the
assignment only runs when the second test passes. Wrap the OR group
in braces: { [ -f a ] || [ -f b ]; } && X="y".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:46:21 -07:00
Garry Tan a4a181ca92 feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692)
* feat: extend gstack-diff-scope with SCOPE_MIGRATIONS, SCOPE_API, SCOPE_AUTH

Three new scope signals for Review Army specialist activation:
- SCOPE_MIGRATIONS: db/migrate/, prisma/migrations/, alembic/, *.sql
- SCOPE_API: *controller*, *route*, *endpoint*, *.graphql, openapi.*
- SCOPE_AUTH: *auth*, *session*, *jwt*, *oauth*, *permission*, *role*

* feat: add 7 specialist checklist files for Review Army

- testing.md (always-on): coverage gaps, flaky patterns, security enforcement
- maintainability.md (always-on): dead code, DRY, stale comments
- security.md (conditional): OWASP deep analysis, auth bypass, injection
- performance.md (conditional): N+1 queries, bundle impact, complexity
- data-migration.md (conditional): reversibility, lock duration, backfill
- api-contract.md (conditional): breaking changes, versioning, error format
- red-team.md (conditional): adversarial analysis, cross-cutting concerns

All use standard header with JSON output schema and NO FINDINGS fallback.

* feat: Review Army resolver — parallel specialist dispatch + merge

New resolver in review-army.ts generates template prose for:
- Stack detection and specialist selection
- Parallel Agent tool dispatch with learning-informed prompts
- JSON finding collection, fingerprint dedup, consensus highlighting
- PR quality score computation
- Red Team conditional dispatch

Registered as REVIEW_ARMY in resolvers/index.ts.

* refactor: restructure /review template for Review Army

- Replace Steps 4-4.75 with CRITICAL pass + {{REVIEW_ARMY}}
- Remove {{DESIGN_REVIEW_LITE}} and {{TEST_COVERAGE_AUDIT_REVIEW}}
  (subsumed into Design and Testing specialists respectively)
- Extract specialist-covered categories from checklist.md
- Keep CRITICAL + uncovered INFORMATIONAL in main agent pass

* test: Review Army — 14 diff-scope tests + 7 E2E tests

- test/diff-scope.test.ts: 14 tests for all 9 scope signals
- test/skill-e2e-review-army.test.ts: 7 E2E tests
  Gate: migration safety, N+1 detection, delivery audit,
        quality score, JSON findings
  Periodic: red team, consensus
- Updated gen-skill-docs tests for new review structure
- Added touchfile entries and tier classifications

* docs: update SELF_LEARNING_V0.md with Release 2 status + Release 2.5

Mark Release 2 (Review Army) as in-progress. Add Release 2.5 for
deferred expansions (E1 adaptive gating, E3 test stubs, E5 cross-review
dedup, E7 specialist tracking).

* chore: bump version and changelog (v0.14.3.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 22:07:50 -06:00