diff --git a/ship/SKILL.md b/ship/SKILL.md
index 8a477912..ce61d03f 100644
--- a/ship/SKILL.md
+++ b/ship/SKILL.md
@@ -241,7 +241,7 @@ Fetch and merge the base branch into the feature branch so tests run against the
git fetch origin && git merge origin/ --no-edit
```
-**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
+**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
**If already up to date:** Continue silently.
@@ -406,85 +406,63 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct
## Step 3: Run tests (on merged code)
-**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
-`db:test:prepare` internally, which loads the schema into the correct lane database.
-Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
+Read CLAUDE.md. Look for a `## Testing` section or test commands in `## Commands`.
+Extract all test commands (lines in bash code blocks that run tests — e.g., `bun test`,
+`npm test`, `pytest`, `go test ./...`, `cargo test`, `bin/rails test`).
-Run both test suites in parallel:
+Run all discovered test commands in parallel, each piped to a unique /tmp file:
```bash
-bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
-npm run test 2>&1 | tee /tmp/ship_vitest.txt &
+{test_command_1} 2>&1 | tee /tmp/ship_tests_1.txt &
+{test_command_2} 2>&1 | tee /tmp/ship_tests_2.txt &
wait
```
-After both complete, read the output files and check pass/fail.
+After all complete, read the output files and check pass/fail.
**If any test fails:** Show the failures and **STOP**. Do not proceed.
**If all pass:** Continue silently — just note the counts briefly.
+**If CLAUDE.md has no test commands:** Use AskUserQuestion:
+
+"I couldn't find test commands in CLAUDE.md. I need to know how to run tests before
+I can ship. Options:
+A) Let me search the repo — I'll look at package.json, Makefile, Gemfile, etc.,
+ figure out the test commands, and add a ## Testing section to CLAUDE.md so we
+ never have to ask again.
+B) Tell me the commands — type them and I'll add them to CLAUDE.md.
+C) This project has no tests — skip testing and continue shipping.
+RECOMMENDATION: Choose A because it's a one-time cost that prevents this question forever."
+
+If A: Search the repo for test infrastructure (package.json scripts, Makefile targets,
+Gemfile test gems, pytest/pyproject.toml config, go.mod, Cargo.toml, CI workflow files).
+Determine the correct test commands. Write a `## Testing` section to CLAUDE.md with the
+discovered commands. Then re-run Step 3 with those commands.
+
+If B: User provides commands. Write them to CLAUDE.md `## Testing` section. Re-run Step 3.
+
+If C: Skip tests with warning. Continue to Step 3.25.
+
---
## Step 3.25: Eval Suites (conditional)
-Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
+Read CLAUDE.md. Look for a `## Evals` section, or eval-related commands in `## Testing`
+or `## Commands` (identified by keywords: "eval", "evals", "judge", "llm-judge").
-**1. Check if the diff touches prompt-related files:**
+If an eval command is found:
+ Run it. The project's eval system handles diff-based file selection internally.
-```bash
-git diff origin/ --name-only
-```
+ ```bash
+ {eval_command} 2>&1 | tee /tmp/ship_evals.txt
+ ```
-Match against these patterns (from CLAUDE.md):
-- `app/services/*_prompt_builder.rb`
-- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
-- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
-- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
-- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
-- `config/system_prompts/*.txt`
-- `test/evals/**/*` (eval infrastructure changes affect all suites)
+ If the eval fails → show failures and **STOP**. Do not proceed.
+ If it passes → note pass counts and cost. Continue to Step 3.4.
-**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
-
-**2. Identify affected eval suites:**
-
-Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
-
-```bash
-grep -l "changed_file_basename" test/evals/*_eval_runner.rb
-```
-
-Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
-
-**Special cases:**
-- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
-- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
-- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
-
-**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
-
-`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
-
-```bash
-EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
-```
-
-If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
-
-**4. Check results:**
-
-- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
-- **If all pass:** Note pass counts and cost. Continue to Step 3.5.
-
-**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 8).
-
-**Tier reference (for context — /ship always uses `full`):**
-| Tier | When | Speed (cached) | Cost |
-|------|------|----------------|------|
-| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
-| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
-| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
+If no eval command found:
+ Skip silently — most projects don't have eval suites. Continue to Step 3.4.
---
@@ -915,8 +893,7 @@ gh pr create --base --title ": " --body "$(cat <<'EOF'
## Test plan
-- [x] All Rails tests pass (N runs, 0 failures)
-- [x] All Vitest tests pass (N tests)
+- [x] All tests pass (list each test suite: command name, pass/fail counts)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl
index 202da3bb..f7247b2c 100644
--- a/ship/SKILL.md.tmpl
+++ b/ship/SKILL.md.tmpl
@@ -87,7 +87,7 @@ Fetch and merge the base branch into the feature branch so tests run against the
git fetch origin && git merge origin/ --no-edit
```
-**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
+**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
**If already up to date:** Continue silently.
@@ -101,85 +101,63 @@ git fetch origin && git merge origin/ --no-edit
## Step 3: Run tests (on merged code)
-**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
-`db:test:prepare` internally, which loads the schema into the correct lane database.
-Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
+Read CLAUDE.md. Look for a `## Testing` section or test commands in `## Commands`.
+Extract all test commands (lines in bash code blocks that run tests — e.g., `bun test`,
+`npm test`, `pytest`, `go test ./...`, `cargo test`, `bin/rails test`).
-Run both test suites in parallel:
+Run all discovered test commands in parallel, each piped to a unique /tmp file:
```bash
-bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
-npm run test 2>&1 | tee /tmp/ship_vitest.txt &
+{test_command_1} 2>&1 | tee /tmp/ship_tests_1.txt &
+{test_command_2} 2>&1 | tee /tmp/ship_tests_2.txt &
wait
```
-After both complete, read the output files and check pass/fail.
+After all complete, read the output files and check pass/fail.
**If any test fails:** Show the failures and **STOP**. Do not proceed.
**If all pass:** Continue silently — just note the counts briefly.
+**If CLAUDE.md has no test commands:** Use AskUserQuestion:
+
+"I couldn't find test commands in CLAUDE.md. I need to know how to run tests before
+I can ship. Options:
+A) Let me search the repo — I'll look at package.json, Makefile, Gemfile, etc.,
+ figure out the test commands, and add a ## Testing section to CLAUDE.md so we
+ never have to ask again.
+B) Tell me the commands — type them and I'll add them to CLAUDE.md.
+C) This project has no tests — skip testing and continue shipping.
+RECOMMENDATION: Choose A because it's a one-time cost that prevents this question forever."
+
+If A: Search the repo for test infrastructure (package.json scripts, Makefile targets,
+Gemfile test gems, pytest/pyproject.toml config, go.mod, Cargo.toml, CI workflow files).
+Determine the correct test commands. Write a `## Testing` section to CLAUDE.md with the
+discovered commands. Then re-run Step 3 with those commands.
+
+If B: User provides commands. Write them to CLAUDE.md `## Testing` section. Re-run Step 3.
+
+If C: Skip tests with warning. Continue to Step 3.25.
+
---
## Step 3.25: Eval Suites (conditional)
-Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
+Read CLAUDE.md. Look for a `## Evals` section, or eval-related commands in `## Testing`
+or `## Commands` (identified by keywords: "eval", "evals", "judge", "llm-judge").
-**1. Check if the diff touches prompt-related files:**
+If an eval command is found:
+ Run it. The project's eval system handles diff-based file selection internally.
-```bash
-git diff origin/ --name-only
-```
+ ```bash
+ {eval_command} 2>&1 | tee /tmp/ship_evals.txt
+ ```
-Match against these patterns (from CLAUDE.md):
-- `app/services/*_prompt_builder.rb`
-- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
-- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
-- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
-- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
-- `config/system_prompts/*.txt`
-- `test/evals/**/*` (eval infrastructure changes affect all suites)
+ If the eval fails → show failures and **STOP**. Do not proceed.
+ If it passes → note pass counts and cost. Continue to Step 3.4.
-**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
-
-**2. Identify affected eval suites:**
-
-Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
-
-```bash
-grep -l "changed_file_basename" test/evals/*_eval_runner.rb
-```
-
-Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
-
-**Special cases:**
-- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
-- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
-- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
-
-**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
-
-`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
-
-```bash
-EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
-```
-
-If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
-
-**4. Check results:**
-
-- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
-- **If all pass:** Note pass counts and cost. Continue to Step 3.5.
-
-**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 8).
-
-**Tier reference (for context — /ship always uses `full`):**
-| Tier | When | Speed (cached) | Cost |
-|------|------|----------------|------|
-| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
-| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
-| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
+If no eval command found:
+ Skip silently — most projects don't have eval suites. Continue to Step 3.4.
---
@@ -579,8 +557,7 @@ gh pr create --base --title ": " --body "$(cat <<'EOF'
## Test plan
-- [x] All Rails tests pass (N runs, 0 failures)
-- [x] All Vitest tests pass (N tests)
+- [x] All tests pass (list each test suite: command name, pass/fail counts)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF