mirror of https://github.com/garrytan/gstack.git synced 2026-05-02 11:45:20 +02:00

Files

T

Garry Tan 3d1e8e0eac feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384 )

* feat: /cso v2 — infrastructure-first security audit

Rewrite /cso from code-centric OWASP scanning to infrastructure-first
attack surface analysis. 15 phases covering secrets archaeology, dependency
supply chain, CI/CD pipeline security, webhook verification, LLM/AI
security, skill supply chain scanning, plus OWASP Top 10, STRIDE, and
data classification.

Key design decisions from eng review + Codex adversarial review:
- Soft gate stack detection (prioritize, don't skip)
- Error on conflicting scope flags (never silently ignore)
- Permission gate before scanning ~/.claude/skills/
- Graceful degradation when audit tools aren't installed
- Finding fingerprints for cross-run trend tracking
- Variant analysis: one verified vuln triggers codebase-wide search
- Dual confidence modes: daily (8/10 gate) vs comprehensive (2/10)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: /cso v2 acknowledgements — 10 projects that informed the design

Credits: Sentry (confidence gating), Trail of Bits (mental model + variant
analysis), Shannon/Keygraph (active verification validation), afiqiqmal
(framework detection + LLM security), Snyk ToxicSkills (skill supply chain),
Miessler PAI (incident playbooks), McGo (report format), Claude Code
Security Pack (modular validation), Anthropic CCS (500+ zero-days), and
@gus_argon (v1 blind spot identification).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: /cso v2 E2E tests — full audit, diff mode, infra scope

Three E2E test cases with planted vulnerabilities:
- cso-full-audit: hardcoded API key + .env tracked by git
- cso-diff-mode: webhook without signature verification on feature branch
- cso-infra-scope: unpinned GitHub Action + Dockerfile without USER

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso E2E tests — correct logCost and recordE2E signatures

logCost requires (label, result), recordE2E requires (collector, name,
suite, result). Fixed all 3 test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso infra E2E test — increase timeout to 360s

The infra scope test runs Agent sub-tasks for parallel finding
verification which can take longer than 240s. Increased maxTurns
from 25 to 60 and timeout from 240s to 360s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: /cso infra E2E test — sharper prompt to prevent exploration waste

The agent was burning 30+ turns exploring a 3-file repo (18 Glob calls,
Explore subagent, 4 SKILL.md reads) before starting the audit. Two Agent
verification subagents then ate ~100s, causing the 240s timeout.

Fix: tell the agent the repo is tiny, list the exact files, skip the
preamble, remove Agent from allowed tools, reduce maxTurns 60→30.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.6.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Codex adversarial findings in /cso v2

Six fixes from Codex adversarial review:

1. Phase 2: Use `git log -G` (regex) instead of `-S` (literal) for
   patterns with alternation (ghp_|gho_|github_pat_, etc.)

2. Phase 12 exclusion #5: Add exception so CI/CD pipeline findings
   from Phase 4 are never auto-discarded when --infra is active

3. Phase 12 exclusion #6: Add exception that unpinned actions and
   missing CODEOWNERS are concrete risks, not "missing hardening"

4. Phase 12 exclusion #15: Add exception that SKILL.md files are
   executable prompt code, not documentation — Phase 8 findings
   in SKILL.md must not be excluded

5. Phase 12 exclusion #1: Add exception that LLM cost/spend
   amplification from Phase 7 is financial risk, not DoS

6. E2E tests: Add exitReason === 'success' assertion to all 3 tests;
   move finalizeEvalCollector to file-level afterAll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-23 06:57:22 -07:00

3.0 KiB

Raw Blame History

Acknowledgements

/cso v2 was informed by research across the security audit landscape. Credits to:

Sentry Security Review — The confidence-based reporting system (only HIGH confidence findings get reported) and the "research before reporting" methodology (trace data flow, check upstream validation) validated our 8/10 daily confidence gate. TimOnWeb rated it the only security skill worth installing out of 5 tested.
Trail of Bits Skills — The audit-context-building methodology (build a mental model before hunting bugs) directly inspired Phase 0. Their variant analysis concept (found one vuln? Search the whole codebase for the same pattern) inspired Phase 12's variant analysis step.
Shannon by Keygraph — Autonomous AI pentester achieving 96.15% on the XBOW benchmark (100/104 exploits). Validated that AI can do real security testing, not just checklist scanning. Our Phase 12 active verification is the static-analysis version of what Shannon does live.
afiqiqmal/claude-security-audit — The AI/LLM-specific security checks (prompt injection, RAG poisoning, tool calling permissions) inspired Phase 7. Their framework-level auto-detection (detecting "Next.js" not just "Node/TypeScript") inspired Phase 0's framework detection step.
Snyk ToxicSkills Research — The finding that 36% of AI agent skills have security flaws and 13.4% are malicious inspired Phase 8 (Skill Supply Chain scanning).
Daniel Miessler's Personal AI Infrastructure — The incident response playbooks and protection file concept informed the remediation and LLM security phases.
McGo/claude-code-security-audit — The idea of generating shareable reports and actionable epics informed our report format evolution.
Claude Code Security Pack — Modular approach (separate /security-audit, /secret-scanner, /deps-check skills) validated that these are distinct concerns. Our unified approach sacrifices modularity for cross-phase reasoning.
Anthropic Claude Code Security — Multi-stage verification and confidence scoring validated our parallel finding verification approach. Found 500+ zero-days in open source.
@gus_argon — Identified critical v1 blind spots: no stack detection (runs all-language patterns), uses bash grep instead of Claude Code's Grep tool, | head -20 truncates results silently, and preamble bloat. These directly shaped v2's stack-first approach and Grep tool mandate.

3.0 KiB Raw Blame History

Acknowledgements

3.0 KiB

Raw Blame History