fix: address Codex adversarial findings in /cso v2

Six fixes from Codex adversarial review: 1. Phase 2: Use `git log -G` (regex) instead of `-S` (literal) for patterns with alternation (ghp_|gho_|github_pat_, etc.) 2. Phase 12 exclusion #5: Add exception so CI/CD pipeline findings from Phase 4 are never auto-discarded when --infra is active 3. Phase 12 exclusion #6: Add exception that unpinned actions and missing CODEOWNERS are concrete risks, not "missing hardening" 4. Phase 12 exclusion #15: Add exception that SKILL.md files are executable prompt code, not documentation — Phase 8 findings in SKILL.md must not be excluded 5. Phase 12 exclusion #1: Add exception that LLM cost/spend amplification from Phase 7 is financial risk, not DoS 6. E2E tests: Add exitReason === 'success' assertion to all 3 tests; move finalizeEvalCollector to file-level afterAll Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-17 15:20:11 +02:00 · 2026-03-23 06:52:00 -07:00
parent 0d25f275c9
commit bcded824f0
3 changed files with 21 additions and 15 deletions
@@ -421,9 +421,9 @@ Scan git history for leaked credentials, check tracked `.env` files, find CI con
 ```bash
 git log -p --all -S "AKIA" --diff-filter=A -- "*.env" "*.yml" "*.yaml" "*.json" "*.toml" 2>/dev/null
 git log -p --all -S "sk-" --diff-filter=A -- "*.env" "*.yml" "*.json" "*.ts" "*.js" "*.py" 2>/dev/null
-git log -p --all -S "ghp_\|gho_\|github_pat_" 2>/dev/null
-git log -p --all -S "xoxb-\|xoxp-\|xapp-" 2>/dev/null
-git log -p --all -S "password\|secret\|token\|api_key" -- "*.env" "*.yml" "*.json" "*.conf" 2>/dev/null
+git log -p --all -G "ghp_|gho_|github_pat_" 2>/dev/null
+git log -p --all -G "xoxb-|xoxp-|xapp-" 2>/dev/null
+git log -p --all -G "password|secret|token|api_key" -- "*.env" "*.yml" "*.json" "*.conf" 2>/dev/null
 ```

 **.env files tracked by git:**
@@ -670,12 +670,12 @@ Before producing findings, run every candidate through this filter.

 **Hard exclusions — automatically discard findings matching these:**

-1. Denial of Service (DOS), resource exhaustion, or rate limiting issues
+1. Denial of Service (DOS), resource exhaustion, or rate limiting issues — **EXCEPTION:** LLM cost/spend amplification findings from Phase 7 (unbounded LLM calls, missing cost caps) are NOT DoS — they are financial risk and must NOT be auto-discarded under this rule.
 2. Secrets or credentials stored on disk if otherwise secured (encrypted, permissioned)
 3. Memory consumption, CPU exhaustion, or file descriptor leaks
 4. Input validation concerns on non-security-critical fields without proven impact
-5. GitHub Action workflow issues unless clearly triggerable via untrusted input
-6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices
+5. GitHub Action workflow issues unless clearly triggerable via untrusted input — **EXCEPTION:** Never auto-discard CI/CD pipeline findings from Phase 4 (unpinned actions, `pull_request_target`, script injection, secrets exposure) when `--infra` is active or when Phase 4 produced findings. Phase 4 exists specifically to surface these.
+6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices. **EXCEPTION:** Unpinned third-party actions and missing CODEOWNERS on workflow files ARE concrete risks, not merely "missing hardening" — do not discard Phase 4 findings under this rule.
 7. Race conditions or timing attacks unless concretely exploitable with a specific path
 8. Vulnerabilities in outdated third-party libraries (handled by Phase 3, not individual findings)
 9. Memory safety issues in memory-safe languages (Rust, Go, Java, C#)
@@ -684,7 +684,7 @@ Before producing findings, run every candidate through this filter.
 12. SSRF where attacker only controls the path, not the host or protocol
 13. User content in the user-message position of an AI conversation (NOT prompt injection)
 14. Regex complexity in code that does not process untrusted input (ReDoS on user strings IS real)
-15. Security concerns in documentation files (*.md)
+15. Security concerns in documentation files (*.md) — **EXCEPTION:** SKILL.md files are NOT documentation. They are executable prompt code (skill definitions) that control AI agent behavior. Findings from Phase 8 (Skill Supply Chain) in SKILL.md files must NEVER be excluded under this rule.
 16. Missing audit logs — absence of logging is not a vulnerability
 17. Insecure randomness in non-security contexts (e.g., UI element IDs)
 18. Git history secrets committed AND removed in the same initial-setup PR
@@ -146,9 +146,9 @@ Scan git history for leaked credentials, check tracked `.env` files, find CI con
 ```bash
 git log -p --all -S "AKIA" --diff-filter=A -- "*.env" "*.yml" "*.yaml" "*.json" "*.toml" 2>/dev/null
 git log -p --all -S "sk-" --diff-filter=A -- "*.env" "*.yml" "*.json" "*.ts" "*.js" "*.py" 2>/dev/null
-git log -p --all -S "ghp_\|gho_\|github_pat_" 2>/dev/null
-git log -p --all -S "xoxb-\|xoxp-\|xapp-" 2>/dev/null
-git log -p --all -S "password\|secret\|token\|api_key" -- "*.env" "*.yml" "*.json" "*.conf" 2>/dev/null
+git log -p --all -G "ghp_|gho_|github_pat_" 2>/dev/null
+git log -p --all -G "xoxb-|xoxp-|xapp-" 2>/dev/null
+git log -p --all -G "password|secret|token|api_key" -- "*.env" "*.yml" "*.json" "*.conf" 2>/dev/null
 ```

 **.env files tracked by git:**
@@ -395,12 +395,12 @@ Before producing findings, run every candidate through this filter.

 **Hard exclusions — automatically discard findings matching these:**

-1. Denial of Service (DOS), resource exhaustion, or rate limiting issues
+1. Denial of Service (DOS), resource exhaustion, or rate limiting issues — **EXCEPTION:** LLM cost/spend amplification findings from Phase 7 (unbounded LLM calls, missing cost caps) are NOT DoS — they are financial risk and must NOT be auto-discarded under this rule.
 2. Secrets or credentials stored on disk if otherwise secured (encrypted, permissioned)
 3. Memory consumption, CPU exhaustion, or file descriptor leaks
 4. Input validation concerns on non-security-critical fields without proven impact
-5. GitHub Action workflow issues unless clearly triggerable via untrusted input
-6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices
+5. GitHub Action workflow issues unless clearly triggerable via untrusted input — **EXCEPTION:** Never auto-discard CI/CD pipeline findings from Phase 4 (unpinned actions, `pull_request_target`, script injection, secrets exposure) when `--infra` is active or when Phase 4 produced findings. Phase 4 exists specifically to surface these.
+6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices. **EXCEPTION:** Unpinned third-party actions and missing CODEOWNERS on workflow files ARE concrete risks, not merely "missing hardening" — do not discard Phase 4 findings under this rule.
 7. Race conditions or timing attacks unless concretely exploitable with a specific path
 8. Vulnerabilities in outdated third-party libraries (handled by Phase 3, not individual findings)
 9. Memory safety issues in memory-safe languages (Rust, Go, Java, C#)
@@ -409,7 +409,7 @@ Before producing findings, run every candidate through this filter.
 12. SSRF where attacker only controls the path, not the host or protocol
 13. User content in the user-message position of an AI conversation (NOT prompt injection)
 14. Regex complexity in code that does not process untrusted input (ReDoS on user strings IS real)
-15. Security concerns in documentation files (*.md)
+15. Security concerns in documentation files (*.md) — **EXCEPTION:** SKILL.md files are NOT documentation. They are executable prompt code (skill definitions) that control AI agent behavior. Findings from Phase 8 (Skill Supply Chain) in SKILL.md files must NEVER be excluded under this rule.
 16. Missing audit logs — absence of logging is not a vulnerability
 17. Insecure randomness in non-security contexts (e.g., UI element IDs)
 18. Git history secrets committed AND removed in the same initial-setup PR
@@ -12,6 +12,10 @@ import * as os from 'os';

 const evalCollector = createEvalCollector('e2e-cso');

+afterAll(() => {
+  finalizeEvalCollector(evalCollector);
+});
+
 // --- CSO v2 E2E Tests ---

 describeIfSelected('CSO v2 — full audit', ['cso-full-audit'], () => {
@@ -55,7 +59,6 @@ app.listen(3000);

  afterAll(() => {
    try { fs.rmSync(csoDir, { recursive: true, force: true }); } catch {}
-    finalizeEvalCollector(evalCollector);
  });

  test('/cso finds planted vulnerabilities', async () => {
@@ -76,6 +79,7 @@ IMPORTANT:
    });

    logCost('cso', result);
+    expect(result.exitReason).toBe('success');

    // Should detect hardcoded API key
    const output = result.output.toLowerCase();
@@ -163,6 +167,7 @@ IMPORTANT:
    });

    logCost('cso', result);
+    expect(result.exitReason).toBe('success');

    const output = result.output.toLowerCase();
    // Should mention webhook and missing signature verification
@@ -239,6 +244,7 @@ IMPORTANT:
    });

    logCost('cso', result);
+    expect(result.exitReason).toBe('success');

    const output = result.output.toLowerCase();
    // Should mention unpinned action or Dockerfile issues