mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
feat: zero-noise /cso security audits with FP filtering (v0.11.0.0)
Absorb Anthropic's security-review false positive filtering into /cso: - 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only, regex injection, race conditions unless concrete, etc.) - 9 precedents (React XSS-safe, env vars trusted, client-side code doesn't need auth, shell scripts need concrete untrusted input path) - 8/10 confidence gate — below threshold = don't report - Independent sub-agent verification for each finding - Exploit scenario requirement per finding - Framework-aware analysis (Rails CSRF, React escaping, Angular sanitization) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -420,24 +420,113 @@ PUBLIC:
|
||||
- Marketing content, documentation, public APIs
|
||||
```
|
||||
|
||||
### Phase 5: Findings Report
|
||||
### Phase 5: False Positive Filtering
|
||||
|
||||
Rate each finding using CVSS-inspired scoring:
|
||||
Before producing findings, run every candidate through this filter. The goal is
|
||||
**zero noise** — better to miss a theoretical issue than flood the report with
|
||||
false positives that erode trust.
|
||||
|
||||
**Hard exclusions — automatically discard findings matching these:**
|
||||
|
||||
1. Denial of Service (DOS), resource exhaustion, or rate limiting issues
|
||||
2. Secrets or credentials stored on disk if otherwise secured (encrypted, permissioned)
|
||||
3. Memory consumption, CPU exhaustion, or file descriptor leaks
|
||||
4. Input validation concerns on non-security-critical fields without proven impact
|
||||
5. GitHub Action workflow issues unless clearly triggerable via untrusted input
|
||||
6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices
|
||||
7. Race conditions or timing attacks unless concretely exploitable with a specific path
|
||||
8. Vulnerabilities in outdated third-party libraries (handled by A06, not individual findings)
|
||||
9. Memory safety issues in memory-safe languages (Rust, Go, Java, C#)
|
||||
10. Files that are only unit tests or test fixtures
|
||||
11. Log spoofing — outputting unsanitized input to logs is not a vulnerability
|
||||
12. SSRF where attacker only controls the path, not the host or protocol
|
||||
13. User-controlled content in AI system prompts is not injection (it's the feature)
|
||||
14. Regex injection or regex DOS — not a real vulnerability class
|
||||
15. Security concerns in documentation files (*.md)
|
||||
16. Missing audit logs — absence of logging is not a vulnerability
|
||||
17. Insecure randomness in non-security contexts (e.g., UI element IDs)
|
||||
|
||||
**Precedents — established rulings that prevent recurring false positives:**
|
||||
|
||||
1. Logging secrets in plaintext IS a vulnerability. Logging URLs is safe.
|
||||
2. UUIDs are unguessable — don't flag missing UUID validation.
|
||||
3. Environment variables and CLI flags are trusted input. Attacks requiring
|
||||
attacker-controlled env vars are invalid.
|
||||
4. React and Angular are XSS-safe by default. Only flag `dangerouslySetInnerHTML`,
|
||||
`bypassSecurityTrustHtml`, or equivalent escape hatches.
|
||||
5. Client-side JS/TS does not need permission checks or auth — that's the server's job.
|
||||
Don't flag frontend code for missing authorization.
|
||||
6. Shell script command injection needs a concrete untrusted input path.
|
||||
Shell scripts generally don't receive untrusted user input.
|
||||
7. Subtle web vulnerabilities (tabnabbing, XS-Leaks, prototype pollution, open redirects)
|
||||
only if extremely high confidence with concrete exploit.
|
||||
8. iPython notebooks (*.ipynb) — only flag if untrusted input can trigger the vulnerability.
|
||||
9. Logging non-PII data is not a vulnerability even if the data is somewhat sensitive.
|
||||
Only flag logging of secrets, passwords, or PII.
|
||||
|
||||
**Confidence gate:** Every finding must score **≥ 8/10 confidence** to appear in the
|
||||
final report. Score calibration:
|
||||
- **9-10:** Certain exploit path identified. Could write a PoC.
|
||||
- **8-9:** Clear vulnerability pattern with known exploitation methods.
|
||||
- **7-8:** Suspicious pattern requiring specific conditions. Include only if HIGH+ severity.
|
||||
- **Below 7:** Do not report. Too speculative.
|
||||
|
||||
### Phase 5.5: Parallel Finding Verification
|
||||
|
||||
For each candidate finding that survives the hard exclusion filter, launch an
|
||||
independent verification sub-task using the Agent tool. The verifier has fresh
|
||||
context and cannot see the initial scan's reasoning — only the finding itself
|
||||
and the false positive filtering rules.
|
||||
|
||||
Prompt each verifier sub-task with:
|
||||
- The specific finding (file, line, category, description)
|
||||
- The full false positive filtering rules (hard exclusions + precedents)
|
||||
- Instruction: "Read the code at this location. Is this a real, exploitable
|
||||
vulnerability? Assign a confidence score 1-10. If below 8, explain why
|
||||
it's likely a false positive."
|
||||
|
||||
Launch all verifier sub-tasks in parallel. Discard any finding where the
|
||||
verifier scores confidence below 8.
|
||||
|
||||
If the Agent tool is unavailable, perform the verification pass yourself
|
||||
by re-reading the code for each finding with a skeptic's eye. Note: "Self-verified
|
||||
— independent sub-task unavailable."
|
||||
|
||||
### Phase 6: Findings Report
|
||||
|
||||
**Exploit scenario requirement:** Every finding MUST include a concrete exploit
|
||||
scenario — a step-by-step attack path an attacker would follow. "This pattern
|
||||
is insecure" is not a finding. "Attacker sends POST /api/users?id=OTHER_USER_ID
|
||||
and receives the other user's data because the controller uses params[:id]
|
||||
without scoping to current_user" is a finding.
|
||||
|
||||
Rate each finding:
|
||||
```
|
||||
SECURITY FINDINGS
|
||||
═════════════════
|
||||
Sev Category Finding OWASP Status
|
||||
──── ──────── ─────── ───── ──────
|
||||
CRIT Injection Raw SQL in search controller A03 Open
|
||||
HIGH Access Control Missing auth on /api/admin/users A01 Open
|
||||
HIGH Crypto API keys in plaintext config file A02 Open
|
||||
MED Config CORS allows *, should be restricted A05 Open
|
||||
MED Logging Failed auth attempts not logged A09 Open
|
||||
LOW Components lodash@4.17.11 has prototype pollution A06 Open
|
||||
INFO Design No rate limiting on password reset A04 Open
|
||||
# Sev Conf Category Finding OWASP File:Line
|
||||
── ──── ──── ──────── ─────── ───── ─────────
|
||||
1 CRIT 9/10 Injection Raw SQL in search controller A03 app/search.rb:47
|
||||
2 HIGH 8/10 Access Control Missing auth on admin endpoint A01 api/admin.ts:12
|
||||
3 HIGH 9/10 Crypto API keys in plaintext config A02 config/app.yml:8
|
||||
4 MED 8/10 Config CORS allows * in production A05 server.ts:34
|
||||
```
|
||||
|
||||
### Phase 6: Remediation Roadmap
|
||||
For each finding, include:
|
||||
|
||||
```
|
||||
## Finding 1: [Title] — [File:Line]
|
||||
|
||||
* **Severity:** CRITICAL | HIGH | MEDIUM
|
||||
* **Confidence:** N/10
|
||||
* **OWASP:** A01-A10
|
||||
* **Description:** [What's wrong — one paragraph]
|
||||
* **Exploit scenario:** [Step-by-step attack path — be specific]
|
||||
* **Impact:** [What an attacker gains — data breach, RCE, privilege escalation]
|
||||
* **Recommendation:** [Specific code change with example]
|
||||
```
|
||||
|
||||
### Phase 7: Remediation Roadmap
|
||||
|
||||
For the top 5 findings, present via AskUserQuestion:
|
||||
|
||||
@@ -450,25 +539,33 @@ For the top 5 findings, present via AskUserQuestion:
|
||||
- C) Accept risk — [document why, set review date]
|
||||
- D) Defer to TODOS.md with security label
|
||||
|
||||
### Phase 7: Save Report
|
||||
### Phase 8: Save Report
|
||||
|
||||
```bash
|
||||
mkdir -p .gstack/security-reports
|
||||
```
|
||||
|
||||
Write findings to `.gstack/security-reports/{date}.json`.
|
||||
Write findings to `.gstack/security-reports/{date}.json`. Include:
|
||||
- Each finding with severity, confidence, category, file, line, description
|
||||
- Verification status (independently verified or self-verified)
|
||||
- Total findings by severity tier
|
||||
- False positives filtered count (so you can track filter effectiveness)
|
||||
|
||||
If prior reports exist, show:
|
||||
- **Resolved:** Findings fixed since last audit
|
||||
- **Persistent:** Findings still open
|
||||
- **New:** Findings discovered this audit
|
||||
- **Trend:** Security posture improving or degrading?
|
||||
- **Filter stats:** N candidates scanned, M filtered as FP, K reported
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Think like an attacker, report like a defender.** Show the exploit path, then the fix.
|
||||
- **Zero noise is more important than zero misses.** A report with 3 real findings is worth more than one with 3 real + 12 theoretical. Users stop reading noisy reports.
|
||||
- **No security theater.** Don't flag theoretical risks with no realistic exploit path. Focus on doors that are actually unlocked.
|
||||
- **Severity calibration matters.** A CRITICAL finding needs a realistic exploitation scenario. If you can't describe how an attacker would exploit it, it's not CRITICAL.
|
||||
- **Confidence gate is absolute.** Below 8/10 confidence = do not report. Period.
|
||||
- **Read-only.** Never modify code. Produce findings and recommendations only.
|
||||
- **Assume competent attackers.** Don't assume security through obscurity works.
|
||||
- **Check the obvious first.** Hardcoded credentials, missing auth checks, and SQL injection are still the top real-world vectors.
|
||||
- **Framework-aware.** Know your framework's built-in protections. Rails has CSRF tokens by default. React escapes by default. Don't flag what the framework already handles.
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
# Changelog
|
||||
|
||||
## [0.11.0.0] - 2026-03-22 — Zero-Noise Security Audits
|
||||
|
||||
### Changed
|
||||
|
||||
- **`/cso` now filters false positives like Anthropic's security review.** 17 hard exclusions (DOS, test files, log spoofing, SSRF path-only, regex injection, and 12 more) plus 9 established precedents (React is XSS-safe by default, env vars are trusted, client-side code doesn't need auth checks). Every finding must score 8/10+ confidence with a concrete exploit scenario — "this pattern is bad" doesn't make the cut. The result: reports with 3 real findings instead of 3 real + 12 theoretical.
|
||||
- **Independent finding verification.** Each candidate finding is verified by a fresh sub-agent that only sees the finding and the FP rules — no anchoring bias from the initial scan. Findings that fail independent verification are silently dropped.
|
||||
- **Exploit scenario requirement.** Every finding now requires a step-by-step attack path: who sends what, to where, and what they get. No more "insecure pattern detected" without a walkable attack.
|
||||
- **Framework-aware analysis.** /cso now knows that Rails has CSRF tokens, React escapes HTML, Angular sanitizes by default. It won't flag what the framework already handles.
|
||||
|
||||
## [0.10.1.0] - 2026-03-22 — Community Security Wave
|
||||
|
||||
### Added
|
||||
|
||||
+111
-14
@@ -428,24 +428,113 @@ PUBLIC:
|
||||
- Marketing content, documentation, public APIs
|
||||
```
|
||||
|
||||
### Phase 5: Findings Report
|
||||
### Phase 5: False Positive Filtering
|
||||
|
||||
Rate each finding using CVSS-inspired scoring:
|
||||
Before producing findings, run every candidate through this filter. The goal is
|
||||
**zero noise** — better to miss a theoretical issue than flood the report with
|
||||
false positives that erode trust.
|
||||
|
||||
**Hard exclusions — automatically discard findings matching these:**
|
||||
|
||||
1. Denial of Service (DOS), resource exhaustion, or rate limiting issues
|
||||
2. Secrets or credentials stored on disk if otherwise secured (encrypted, permissioned)
|
||||
3. Memory consumption, CPU exhaustion, or file descriptor leaks
|
||||
4. Input validation concerns on non-security-critical fields without proven impact
|
||||
5. GitHub Action workflow issues unless clearly triggerable via untrusted input
|
||||
6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices
|
||||
7. Race conditions or timing attacks unless concretely exploitable with a specific path
|
||||
8. Vulnerabilities in outdated third-party libraries (handled by A06, not individual findings)
|
||||
9. Memory safety issues in memory-safe languages (Rust, Go, Java, C#)
|
||||
10. Files that are only unit tests or test fixtures
|
||||
11. Log spoofing — outputting unsanitized input to logs is not a vulnerability
|
||||
12. SSRF where attacker only controls the path, not the host or protocol
|
||||
13. User-controlled content in AI system prompts is not injection (it's the feature)
|
||||
14. Regex injection or regex DOS — not a real vulnerability class
|
||||
15. Security concerns in documentation files (*.md)
|
||||
16. Missing audit logs — absence of logging is not a vulnerability
|
||||
17. Insecure randomness in non-security contexts (e.g., UI element IDs)
|
||||
|
||||
**Precedents — established rulings that prevent recurring false positives:**
|
||||
|
||||
1. Logging secrets in plaintext IS a vulnerability. Logging URLs is safe.
|
||||
2. UUIDs are unguessable — don't flag missing UUID validation.
|
||||
3. Environment variables and CLI flags are trusted input. Attacks requiring
|
||||
attacker-controlled env vars are invalid.
|
||||
4. React and Angular are XSS-safe by default. Only flag `dangerouslySetInnerHTML`,
|
||||
`bypassSecurityTrustHtml`, or equivalent escape hatches.
|
||||
5. Client-side JS/TS does not need permission checks or auth — that's the server's job.
|
||||
Don't flag frontend code for missing authorization.
|
||||
6. Shell script command injection needs a concrete untrusted input path.
|
||||
Shell scripts generally don't receive untrusted user input.
|
||||
7. Subtle web vulnerabilities (tabnabbing, XS-Leaks, prototype pollution, open redirects)
|
||||
only if extremely high confidence with concrete exploit.
|
||||
8. iPython notebooks (*.ipynb) — only flag if untrusted input can trigger the vulnerability.
|
||||
9. Logging non-PII data is not a vulnerability even if the data is somewhat sensitive.
|
||||
Only flag logging of secrets, passwords, or PII.
|
||||
|
||||
**Confidence gate:** Every finding must score **≥ 8/10 confidence** to appear in the
|
||||
final report. Score calibration:
|
||||
- **9-10:** Certain exploit path identified. Could write a PoC.
|
||||
- **8-9:** Clear vulnerability pattern with known exploitation methods.
|
||||
- **7-8:** Suspicious pattern requiring specific conditions. Include only if HIGH+ severity.
|
||||
- **Below 7:** Do not report. Too speculative.
|
||||
|
||||
### Phase 5.5: Parallel Finding Verification
|
||||
|
||||
For each candidate finding that survives the hard exclusion filter, launch an
|
||||
independent verification sub-task using the Agent tool. The verifier has fresh
|
||||
context and cannot see the initial scan's reasoning — only the finding itself
|
||||
and the false positive filtering rules.
|
||||
|
||||
Prompt each verifier sub-task with:
|
||||
- The specific finding (file, line, category, description)
|
||||
- The full false positive filtering rules (hard exclusions + precedents)
|
||||
- Instruction: "Read the code at this location. Is this a real, exploitable
|
||||
vulnerability? Assign a confidence score 1-10. If below 8, explain why
|
||||
it's likely a false positive."
|
||||
|
||||
Launch all verifier sub-tasks in parallel. Discard any finding where the
|
||||
verifier scores confidence below 8.
|
||||
|
||||
If the Agent tool is unavailable, perform the verification pass yourself
|
||||
by re-reading the code for each finding with a skeptic's eye. Note: "Self-verified
|
||||
— independent sub-task unavailable."
|
||||
|
||||
### Phase 6: Findings Report
|
||||
|
||||
**Exploit scenario requirement:** Every finding MUST include a concrete exploit
|
||||
scenario — a step-by-step attack path an attacker would follow. "This pattern
|
||||
is insecure" is not a finding. "Attacker sends POST /api/users?id=OTHER_USER_ID
|
||||
and receives the other user's data because the controller uses params[:id]
|
||||
without scoping to current_user" is a finding.
|
||||
|
||||
Rate each finding:
|
||||
```
|
||||
SECURITY FINDINGS
|
||||
═════════════════
|
||||
Sev Category Finding OWASP Status
|
||||
──── ──────── ─────── ───── ──────
|
||||
CRIT Injection Raw SQL in search controller A03 Open
|
||||
HIGH Access Control Missing auth on /api/admin/users A01 Open
|
||||
HIGH Crypto API keys in plaintext config file A02 Open
|
||||
MED Config CORS allows *, should be restricted A05 Open
|
||||
MED Logging Failed auth attempts not logged A09 Open
|
||||
LOW Components lodash@4.17.11 has prototype pollution A06 Open
|
||||
INFO Design No rate limiting on password reset A04 Open
|
||||
# Sev Conf Category Finding OWASP File:Line
|
||||
── ──── ──── ──────── ─────── ───── ─────────
|
||||
1 CRIT 9/10 Injection Raw SQL in search controller A03 app/search.rb:47
|
||||
2 HIGH 8/10 Access Control Missing auth on admin endpoint A01 api/admin.ts:12
|
||||
3 HIGH 9/10 Crypto API keys in plaintext config A02 config/app.yml:8
|
||||
4 MED 8/10 Config CORS allows * in production A05 server.ts:34
|
||||
```
|
||||
|
||||
### Phase 6: Remediation Roadmap
|
||||
For each finding, include:
|
||||
|
||||
```
|
||||
## Finding 1: [Title] — [File:Line]
|
||||
|
||||
* **Severity:** CRITICAL | HIGH | MEDIUM
|
||||
* **Confidence:** N/10
|
||||
* **OWASP:** A01-A10
|
||||
* **Description:** [What's wrong — one paragraph]
|
||||
* **Exploit scenario:** [Step-by-step attack path — be specific]
|
||||
* **Impact:** [What an attacker gains — data breach, RCE, privilege escalation]
|
||||
* **Recommendation:** [Specific code change with example]
|
||||
```
|
||||
|
||||
### Phase 7: Remediation Roadmap
|
||||
|
||||
For the top 5 findings, present via AskUserQuestion:
|
||||
|
||||
@@ -458,25 +547,33 @@ For the top 5 findings, present via AskUserQuestion:
|
||||
- C) Accept risk — [document why, set review date]
|
||||
- D) Defer to TODOS.md with security label
|
||||
|
||||
### Phase 7: Save Report
|
||||
### Phase 8: Save Report
|
||||
|
||||
```bash
|
||||
mkdir -p .gstack/security-reports
|
||||
```
|
||||
|
||||
Write findings to `.gstack/security-reports/{date}.json`.
|
||||
Write findings to `.gstack/security-reports/{date}.json`. Include:
|
||||
- Each finding with severity, confidence, category, file, line, description
|
||||
- Verification status (independently verified or self-verified)
|
||||
- Total findings by severity tier
|
||||
- False positives filtered count (so you can track filter effectiveness)
|
||||
|
||||
If prior reports exist, show:
|
||||
- **Resolved:** Findings fixed since last audit
|
||||
- **Persistent:** Findings still open
|
||||
- **New:** Findings discovered this audit
|
||||
- **Trend:** Security posture improving or degrading?
|
||||
- **Filter stats:** N candidates scanned, M filtered as FP, K reported
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Think like an attacker, report like a defender.** Show the exploit path, then the fix.
|
||||
- **Zero noise is more important than zero misses.** A report with 3 real findings is worth more than one with 3 real + 12 theoretical. Users stop reading noisy reports.
|
||||
- **No security theater.** Don't flag theoretical risks with no realistic exploit path. Focus on doors that are actually unlocked.
|
||||
- **Severity calibration matters.** A CRITICAL finding needs a realistic exploitation scenario. If you can't describe how an attacker would exploit it, it's not CRITICAL.
|
||||
- **Confidence gate is absolute.** Below 8/10 confidence = do not report. Period.
|
||||
- **Read-only.** Never modify code. Produce findings and recommendations only.
|
||||
- **Assume competent attackers.** Don't assume security through obscurity works.
|
||||
- **Check the obvious first.** Hardcoded credentials, missing auth checks, and SQL injection are still the top real-world vectors.
|
||||
- **Framework-aware.** Know your framework's built-in protections. Rails has CSRF tokens by default. React escapes by default. Don't flag what the framework already handles.
|
||||
|
||||
+111
-14
@@ -204,24 +204,113 @@ PUBLIC:
|
||||
- Marketing content, documentation, public APIs
|
||||
```
|
||||
|
||||
### Phase 5: Findings Report
|
||||
### Phase 5: False Positive Filtering
|
||||
|
||||
Rate each finding using CVSS-inspired scoring:
|
||||
Before producing findings, run every candidate through this filter. The goal is
|
||||
**zero noise** — better to miss a theoretical issue than flood the report with
|
||||
false positives that erode trust.
|
||||
|
||||
**Hard exclusions — automatically discard findings matching these:**
|
||||
|
||||
1. Denial of Service (DOS), resource exhaustion, or rate limiting issues
|
||||
2. Secrets or credentials stored on disk if otherwise secured (encrypted, permissioned)
|
||||
3. Memory consumption, CPU exhaustion, or file descriptor leaks
|
||||
4. Input validation concerns on non-security-critical fields without proven impact
|
||||
5. GitHub Action workflow issues unless clearly triggerable via untrusted input
|
||||
6. Missing hardening measures — flag concrete vulnerabilities, not absent best practices
|
||||
7. Race conditions or timing attacks unless concretely exploitable with a specific path
|
||||
8. Vulnerabilities in outdated third-party libraries (handled by A06, not individual findings)
|
||||
9. Memory safety issues in memory-safe languages (Rust, Go, Java, C#)
|
||||
10. Files that are only unit tests or test fixtures
|
||||
11. Log spoofing — outputting unsanitized input to logs is not a vulnerability
|
||||
12. SSRF where attacker only controls the path, not the host or protocol
|
||||
13. User-controlled content in AI system prompts is not injection (it's the feature)
|
||||
14. Regex injection or regex DOS — not a real vulnerability class
|
||||
15. Security concerns in documentation files (*.md)
|
||||
16. Missing audit logs — absence of logging is not a vulnerability
|
||||
17. Insecure randomness in non-security contexts (e.g., UI element IDs)
|
||||
|
||||
**Precedents — established rulings that prevent recurring false positives:**
|
||||
|
||||
1. Logging secrets in plaintext IS a vulnerability. Logging URLs is safe.
|
||||
2. UUIDs are unguessable — don't flag missing UUID validation.
|
||||
3. Environment variables and CLI flags are trusted input. Attacks requiring
|
||||
attacker-controlled env vars are invalid.
|
||||
4. React and Angular are XSS-safe by default. Only flag `dangerouslySetInnerHTML`,
|
||||
`bypassSecurityTrustHtml`, or equivalent escape hatches.
|
||||
5. Client-side JS/TS does not need permission checks or auth — that's the server's job.
|
||||
Don't flag frontend code for missing authorization.
|
||||
6. Shell script command injection needs a concrete untrusted input path.
|
||||
Shell scripts generally don't receive untrusted user input.
|
||||
7. Subtle web vulnerabilities (tabnabbing, XS-Leaks, prototype pollution, open redirects)
|
||||
only if extremely high confidence with concrete exploit.
|
||||
8. iPython notebooks (*.ipynb) — only flag if untrusted input can trigger the vulnerability.
|
||||
9. Logging non-PII data is not a vulnerability even if the data is somewhat sensitive.
|
||||
Only flag logging of secrets, passwords, or PII.
|
||||
|
||||
**Confidence gate:** Every finding must score **≥ 8/10 confidence** to appear in the
|
||||
final report. Score calibration:
|
||||
- **9-10:** Certain exploit path identified. Could write a PoC.
|
||||
- **8-9:** Clear vulnerability pattern with known exploitation methods.
|
||||
- **7-8:** Suspicious pattern requiring specific conditions. Include only if HIGH+ severity.
|
||||
- **Below 7:** Do not report. Too speculative.
|
||||
|
||||
### Phase 5.5: Parallel Finding Verification
|
||||
|
||||
For each candidate finding that survives the hard exclusion filter, launch an
|
||||
independent verification sub-task using the Agent tool. The verifier has fresh
|
||||
context and cannot see the initial scan's reasoning — only the finding itself
|
||||
and the false positive filtering rules.
|
||||
|
||||
Prompt each verifier sub-task with:
|
||||
- The specific finding (file, line, category, description)
|
||||
- The full false positive filtering rules (hard exclusions + precedents)
|
||||
- Instruction: "Read the code at this location. Is this a real, exploitable
|
||||
vulnerability? Assign a confidence score 1-10. If below 8, explain why
|
||||
it's likely a false positive."
|
||||
|
||||
Launch all verifier sub-tasks in parallel. Discard any finding where the
|
||||
verifier scores confidence below 8.
|
||||
|
||||
If the Agent tool is unavailable, perform the verification pass yourself
|
||||
by re-reading the code for each finding with a skeptic's eye. Note: "Self-verified
|
||||
— independent sub-task unavailable."
|
||||
|
||||
### Phase 6: Findings Report
|
||||
|
||||
**Exploit scenario requirement:** Every finding MUST include a concrete exploit
|
||||
scenario — a step-by-step attack path an attacker would follow. "This pattern
|
||||
is insecure" is not a finding. "Attacker sends POST /api/users?id=OTHER_USER_ID
|
||||
and receives the other user's data because the controller uses params[:id]
|
||||
without scoping to current_user" is a finding.
|
||||
|
||||
Rate each finding:
|
||||
```
|
||||
SECURITY FINDINGS
|
||||
═════════════════
|
||||
Sev Category Finding OWASP Status
|
||||
──── ──────── ─────── ───── ──────
|
||||
CRIT Injection Raw SQL in search controller A03 Open
|
||||
HIGH Access Control Missing auth on /api/admin/users A01 Open
|
||||
HIGH Crypto API keys in plaintext config file A02 Open
|
||||
MED Config CORS allows *, should be restricted A05 Open
|
||||
MED Logging Failed auth attempts not logged A09 Open
|
||||
LOW Components lodash@4.17.11 has prototype pollution A06 Open
|
||||
INFO Design No rate limiting on password reset A04 Open
|
||||
# Sev Conf Category Finding OWASP File:Line
|
||||
── ──── ──── ──────── ─────── ───── ─────────
|
||||
1 CRIT 9/10 Injection Raw SQL in search controller A03 app/search.rb:47
|
||||
2 HIGH 8/10 Access Control Missing auth on admin endpoint A01 api/admin.ts:12
|
||||
3 HIGH 9/10 Crypto API keys in plaintext config A02 config/app.yml:8
|
||||
4 MED 8/10 Config CORS allows * in production A05 server.ts:34
|
||||
```
|
||||
|
||||
### Phase 6: Remediation Roadmap
|
||||
For each finding, include:
|
||||
|
||||
```
|
||||
## Finding 1: [Title] — [File:Line]
|
||||
|
||||
* **Severity:** CRITICAL | HIGH | MEDIUM
|
||||
* **Confidence:** N/10
|
||||
* **OWASP:** A01-A10
|
||||
* **Description:** [What's wrong — one paragraph]
|
||||
* **Exploit scenario:** [Step-by-step attack path — be specific]
|
||||
* **Impact:** [What an attacker gains — data breach, RCE, privilege escalation]
|
||||
* **Recommendation:** [Specific code change with example]
|
||||
```
|
||||
|
||||
### Phase 7: Remediation Roadmap
|
||||
|
||||
For the top 5 findings, present via AskUserQuestion:
|
||||
|
||||
@@ -234,25 +323,33 @@ For the top 5 findings, present via AskUserQuestion:
|
||||
- C) Accept risk — [document why, set review date]
|
||||
- D) Defer to TODOS.md with security label
|
||||
|
||||
### Phase 7: Save Report
|
||||
### Phase 8: Save Report
|
||||
|
||||
```bash
|
||||
mkdir -p .gstack/security-reports
|
||||
```
|
||||
|
||||
Write findings to `.gstack/security-reports/{date}.json`.
|
||||
Write findings to `.gstack/security-reports/{date}.json`. Include:
|
||||
- Each finding with severity, confidence, category, file, line, description
|
||||
- Verification status (independently verified or self-verified)
|
||||
- Total findings by severity tier
|
||||
- False positives filtered count (so you can track filter effectiveness)
|
||||
|
||||
If prior reports exist, show:
|
||||
- **Resolved:** Findings fixed since last audit
|
||||
- **Persistent:** Findings still open
|
||||
- **New:** Findings discovered this audit
|
||||
- **Trend:** Security posture improving or degrading?
|
||||
- **Filter stats:** N candidates scanned, M filtered as FP, K reported
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Think like an attacker, report like a defender.** Show the exploit path, then the fix.
|
||||
- **Zero noise is more important than zero misses.** A report with 3 real findings is worth more than one with 3 real + 12 theoretical. Users stop reading noisy reports.
|
||||
- **No security theater.** Don't flag theoretical risks with no realistic exploit path. Focus on doors that are actually unlocked.
|
||||
- **Severity calibration matters.** A CRITICAL finding needs a realistic exploitation scenario. If you can't describe how an attacker would exploit it, it's not CRITICAL.
|
||||
- **Confidence gate is absolute.** Below 8/10 confidence = do not report. Period.
|
||||
- **Read-only.** Never modify code. Produce findings and recommendations only.
|
||||
- **Assume competent attackers.** Don't assume security through obscurity works.
|
||||
- **Check the obvious first.** Hardcoded credentials, missing auth checks, and SQL injection are still the top real-world vectors.
|
||||
- **Framework-aware.** Know your framework's built-in protections. Rails has CSRF tokens by default. React escapes by default. Don't flag what the framework already handles.
|
||||
|
||||
Reference in New Issue
Block a user