mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
3d1e8e0eac
* feat: /cso v2 — infrastructure-first security audit Rewrite /cso from code-centric OWASP scanning to infrastructure-first attack surface analysis. 15 phases covering secrets archaeology, dependency supply chain, CI/CD pipeline security, webhook verification, LLM/AI security, skill supply chain scanning, plus OWASP Top 10, STRIDE, and data classification. Key design decisions from eng review + Codex adversarial review: - Soft gate stack detection (prioritize, don't skip) - Error on conflicting scope flags (never silently ignore) - Permission gate before scanning ~/.claude/skills/ - Graceful degradation when audit tools aren't installed - Finding fingerprints for cross-run trend tracking - Variant analysis: one verified vuln triggers codebase-wide search - Dual confidence modes: daily (8/10 gate) vs comprehensive (2/10) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: /cso v2 acknowledgements — 10 projects that informed the design Credits: Sentry (confidence gating), Trail of Bits (mental model + variant analysis), Shannon/Keygraph (active verification validation), afiqiqmal (framework detection + LLM security), Snyk ToxicSkills (skill supply chain), Miessler PAI (incident playbooks), McGo (report format), Claude Code Security Pack (modular validation), Anthropic CCS (500+ zero-days), and @gus_argon (v1 blind spot identification). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /cso v2 E2E tests — full audit, diff mode, infra scope Three E2E test cases with planted vulnerabilities: - cso-full-audit: hardcoded API key + .env tracked by git - cso-diff-mode: webhook without signature verification on feature branch - cso-infra-scope: unpinned GitHub Action + Dockerfile without USER Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /cso E2E tests — correct logCost and recordE2E signatures logCost requires (label, result), recordE2E requires (collector, name, suite, result). Fixed all 3 test cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /cso infra E2E test — increase timeout to 360s The infra scope test runs Agent sub-tasks for parallel finding verification which can take longer than 240s. Increased maxTurns from 25 to 60 and timeout from 240s to 360s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /cso infra E2E test — sharper prompt to prevent exploration waste The agent was burning 30+ turns exploring a 3-file repo (18 Glob calls, Explore subagent, 4 SKILL.md reads) before starting the audit. Two Agent verification subagents then ate ~100s, causing the 240s timeout. Fix: tell the agent the repo is tiny, list the exact files, skip the preamble, remove Agent from allowed tools, reduce maxTurns 60→30. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.11.6.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Codex adversarial findings in /cso v2 Six fixes from Codex adversarial review: 1. Phase 2: Use `git log -G` (regex) instead of `-S` (literal) for patterns with alternation (ghp_|gho_|github_pat_, etc.) 2. Phase 12 exclusion #5: Add exception so CI/CD pipeline findings from Phase 4 are never auto-discarded when --infra is active 3. Phase 12 exclusion #6: Add exception that unpinned actions and missing CODEOWNERS are concrete risks, not "missing hardening" 4. Phase 12 exclusion #15: Add exception that SKILL.md files are executable prompt code, not documentation — Phase 8 findings in SKILL.md must not be excluded 5. Phase 12 exclusion #1: Add exception that LLM cost/spend amplification from Phase 7 is financial risk, not DoS 6. E2E tests: Add exitReason === 'success' assertion to all 3 tests; move finalizeEvalCollector to file-level afterAll Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 lines
3.0 KiB
Markdown
15 lines
3.0 KiB
Markdown
# Acknowledgements
|
|
|
|
/cso v2 was informed by research across the security audit landscape. Credits to:
|
|
|
|
- **[Sentry Security Review](https://github.com/getsentry/skills)** — The confidence-based reporting system (only HIGH confidence findings get reported) and the "research before reporting" methodology (trace data flow, check upstream validation) validated our 8/10 daily confidence gate. TimOnWeb rated it the only security skill worth installing out of 5 tested.
|
|
- **[Trail of Bits Skills](https://github.com/trailofbits/skills)** — The audit-context-building methodology (build a mental model before hunting bugs) directly inspired Phase 0. Their variant analysis concept (found one vuln? Search the whole codebase for the same pattern) inspired Phase 12's variant analysis step.
|
|
- **[Shannon by Keygraph](https://github.com/KeygraphHQ/shannon)** — Autonomous AI pentester achieving 96.15% on the XBOW benchmark (100/104 exploits). Validated that AI can do real security testing, not just checklist scanning. Our Phase 12 active verification is the static-analysis version of what Shannon does live.
|
|
- **[afiqiqmal/claude-security-audit](https://github.com/afiqiqmal/claude-security-audit)** — The AI/LLM-specific security checks (prompt injection, RAG poisoning, tool calling permissions) inspired Phase 7. Their framework-level auto-detection (detecting "Next.js" not just "Node/TypeScript") inspired Phase 0's framework detection step.
|
|
- **[Snyk ToxicSkills Research](https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/)** — The finding that 36% of AI agent skills have security flaws and 13.4% are malicious inspired Phase 8 (Skill Supply Chain scanning).
|
|
- **[Daniel Miessler's Personal AI Infrastructure](https://github.com/danielmiessler/Personal_AI_Infrastructure)** — The incident response playbooks and protection file concept informed the remediation and LLM security phases.
|
|
- **[McGo/claude-code-security-audit](https://github.com/McGo/claude-code-security-audit)** — The idea of generating shareable reports and actionable epics informed our report format evolution.
|
|
- **[Claude Code Security Pack](https://dev.to/myougatheaxo/automate-owasp-security-audits-with-claude-code-security-pack-4mah)** — Modular approach (separate /security-audit, /secret-scanner, /deps-check skills) validated that these are distinct concerns. Our unified approach sacrifices modularity for cross-phase reasoning.
|
|
- **[Anthropic Claude Code Security](https://www.anthropic.com/news/claude-code-security)** — Multi-stage verification and confidence scoring validated our parallel finding verification approach. Found 500+ zero-days in open source.
|
|
- **[@gus_argon](https://x.com/gus_aragon/status/2035841289602904360)** — Identified critical v1 blind spots: no stack detection (runs all-language patterns), uses bash grep instead of Claude Code's Grep tool, `| head -20` truncates results silently, and preamble bloat. These directly shaped v2's stack-first approach and Grep tool mandate.
|