god-eye/AI_SETUP.md

# 🧠 AI Integration Guide

<p align="center">
  <img src="https://img.shields.io/badge/Ollama-local-blueviolet?style=for-the-badge&logo=ollama" alt="Ollama">
  <img src="https://img.shields.io/badge/privacy-100%25%20offline-success?style=for-the-badge" alt="Privacy">
  <img src="https://img.shields.io/badge/cost-%240-green?style=for-the-badge" alt="Cost">
  <img src="https://img.shields.io/badge/CVE%20correlation-automated-critical?style=for-the-badge" alt="CVE">
</p>

> **No API keys. No cloud. No telemetry. No usage caps. Runs on your laptop.**

God's Eye v2 is the only open-source attack-surface tool with **automated CVE correlation via a local LLM**. Apache 2.4.7 detected → CVE-2026-34197 surfaced. WordPress 5.8.2 fingerprinted → known vulnerabilities chained. All through an Ollama cascade that triages, then drills down with a **30B Mixture-of-Experts model** that activates just 3.3B parameters per token.

Everything stays on your machine. No data leaves your hardware.

<p align="center">
  <img src="assets/ai-verbose.gif" alt="AI cascade against Apache 2.4.7 on scanme.nmap.org" width="90%">
</p>

<p align="center">
  <sub><em>Every scan ends with an <b>AI SCAN BRIEF</b> — severity totals, top exploitable chains, executive summary, and recommended next actions — framed in the terminal. Recorded live on <code>scanme.nmap.org</code>, models served by local Ollama.</em></sub>
</p>

---

## 🎯 End-of-scan brief

Every scan that produces findings ends with a framed summary the AI writes for you. Six parts:

```
┌──  AI SCAN BRIEF — target.com  ─────────────────────────────────────────────┐
│ Totals
│   Hosts: 17   Active: 13   AI findings: 23
│
│ Findings by severity
│    CRIT   critical   2
│   [HIGH]  high       7
│   [MED]   medium     12
│   [LOW]   low        4
│
│ Top exploitable chains
│   ▸ admin.target.com  — Git Repository Exposed + Open Redirect
│   ▸ api.target.com    — CORS Misconfiguration + JWT alg=none
│   ▸ legacy.target.com — Apache@2.4.7→CVE-2026-34197
│
│ AI agents that contributed
│   • http-analyzer       8 findings
│   • secret-validator    6 findings
│   • anomaly-detector    1 findings
│   • report-writer       1 findings
│
│ AI executive summary
│   Scan identified two critical issues requiring immediate attention:
│   exposed git repository on admin.target.com and an Apache 2.4.7 server
│   (end-of-life since 2014) running on legacy.target.com. The cross-host
│   anomaly detector flagged a dev-environment leak into production.
│
│ Recommended next actions
│   1. Remove .git directory from admin.target.com (CRITICAL)
│   2. Patch Apache 2.4.7 → vendor latest (affects legacy.target.com)
│   3. Rotate JWT signing key on api.target.com
│   4. Move dev.api.target.com off production DNS
│   5. Investigate anomaly: shared SSH key across 3 hosts
└─────────────────────────────────────────────────────────────────────────────┘
```

It's generated by `internal/modules/brief`, runs in `PhaseReporting` after all other modules have finished, and only prints when findings exist (silent/JSON modes suppress it automatically).

---

## Table of contents

1. [Quick start (5 minutes)](#quick-start-5-minutes)
2. [How the cascade works](#how-the-cascade-works)
3. [AI profiles — pick your tier](#ai-profiles--pick-your-tier)
4. [The interactive wizard](#the-interactive-wizard)
5. [Auto-pull of missing models](#auto-pull-of-missing-models)
6. [Verbose mode](#verbose-mode)
7. [Multi-agent orchestration](#multi-agent-orchestration)
8. [CVE matching](#cve-matching)
9. [Custom models + YAML config](#custom-models--yaml-config)
10. [Troubleshooting](#troubleshooting)
11. [Privacy & security model](#privacy--security-model)
12. [Performance reference](#performance-reference)

---

## Quick start (5 minutes)

### 1. Install Ollama

**macOS / Linux:**
```bash
curl https://ollama.ai/install.sh | sh
```

**Windows:** download from [ollama.com/download](https://ollama.com/download).

Verify:
```bash
ollama --version
```

### 2. Start the Ollama server

```bash
ollama serve &
```

Listens on `http://localhost:11434`. Leave it running.

### 3. Run God's Eye

The easiest path — let the wizard handle everything:

```bash
./god-eye
```

It will:
1. Ask which AI tier you want (lean / balanced / heavy / none)
2. Check which models are already installed
3. Offer to download missing ones (with live progress)
4. Ask for your target domain
5. Start the scan

Manual route:

```bash
# Defaults (lean tier): pulls qwen3:1.7b + qwen2.5-coder:14b if missing
./god-eye -d target.com --pipeline --enable-ai

# Balanced tier (32GB RAM): MoE deep model, 256K context
./god-eye -d target.com --pipeline --enable-ai --ai-profile balanced

# Heavy tier (64GB+ RAM): best quality
./god-eye -d target.com --pipeline --enable-ai --ai-profile heavy --ai-verbose
```

---

## How the cascade works

Every finding goes through a two-stage pipeline:

```
┌──────────────────────────────────────────────┐
│  FINDING DETECTED                            │
│  (JS secret, HTTP response, tech version,    │
│   takeover candidate, vuln, etc.)            │
└──────────────┬───────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────┐
│  TIER 1: FAST TRIAGE                         │
│  • lean:     qwen3:1.7b                      │
│  • balanced: qwen3:4b                        │
│  • heavy:    qwen3:8b                        │
│                                              │
│  Output: "relevant" vs "skip"                │
│  Latency: 0.5–2 seconds                      │
└──────────────┬───────────────────────────────┘
               │  if relevant ↓
               ▼
┌──────────────────────────────────────────────┐
│  TIER 2: DEEP ANALYSIS                       │
│  • lean:     qwen2.5-coder:14b               │
│  • balanced: qwen3-coder:30b (MoE)           │
│  • heavy:    qwen3-coder:30b (MoE)           │
│                                              │
│  Output: severity, description, PoC,          │
│          remediation, OWASP + CVE matches    │
│  Latency: 5–25 seconds                       │
└──────────────┬───────────────────────────────┘
               │
               ▼
         AIFinding event → store → report
```

**Why two tiers?** Pure cost/quality — the fast model filters ~70% of findings as non-issues without paying for the deep model's runtime. Cascades reduce total wall-clock by 40–60% while keeping quality identical for what actually surfaces.

Disable the cascade to always run the deep model (slower, no quality gain on most findings):

```bash
./god-eye -d target.com --pipeline --enable-ai --ai-cascade=false
```

---

## AI profiles — pick your tier

| Profile          | Triage model | Deep model              | Disk pull | VRAM (Q4) | Best for                        |
|------------------|--------------|-------------------------|-----------|-----------|---------------------------------|
| `lean` (default) | qwen3:1.7b   | qwen2.5-coder:14b       | ~10GB     | ~10GB     | 16GB RAM laptops, CI runners    |
| `balanced`       | qwen3:4b     | qwen3-coder:30b **(MoE)** | ~20GB   | ~17GB     | 32GB RAM workstations — **sweet spot** |
| `heavy`          | qwen3:8b     | qwen3-coder:30b **(MoE)** | ~23GB   | ~22GB     | 64GB+ servers, top-quality runs |

### Why MoE (Mixture of Experts) matters for balanced/heavy

`qwen3-coder:30b` is a **Mixture-of-Experts** model with 30B total parameters but only **3.3B active per token**. Inference speed is closer to a dense 3B model while quality is closer to a dense 30B. Combined with a 256K context window it can ingest entire JS bundles + long HTTP response bodies in a single prompt — useful for the deep-analysis step.

### Pick your profile with one question

> *"How much RAM can I dedicate to Ollama while the scan runs?"*

- **< 16GB** → use `lean`, possibly shrink with `--ai-deep-model qwen2.5-coder:7b`
- **16–32GB** → `lean` (or `balanced` if your deep model fits)
- **32GB+** → `balanced` (recommended) or `heavy`

The wizard asks this for you if you're unsure.

---

## The interactive wizard

Run `./god-eye` with no `-d` flag in a terminal — the wizard launches automatically:

```
═══════════════════════════════════════════════════════════
  God's Eye v2 — interactive setup
  Ctrl-C to abort at any time.
═══════════════════════════════════════════════════════════

? Select AI tier
  ▸ 1) Lean     — 16GB RAM · qwen3:1.7b + qwen2.5-coder:14b (default)
    2) Balanced — 32GB RAM · qwen3:4b + qwen3-coder:30b (MoE, 256K ctx)
    3) Heavy    — 64GB RAM · qwen3:8b + qwen3-coder:30b (max quality)
    4) No AI    — Pure recon without LLM analysis
  Choice [1]:

⚙ Checking Ollama at http://localhost:11434…
  ↓ Missing models: qwen3:1.7b, qwen2.5-coder:14b
? Download missing models now? [Y/n]
  > y
↓ qwen3:1.7b
  pulling manifest         10%  150MB / 1.4GB
  pulling manifest         50%  700MB / 1.4GB
  pulling manifest        100%  1.4GB / 1.4GB
  verifying sha256 digest
  writing manifest
  success                 100%
✓ qwen3:1.7b ready
↓ qwen2.5-coder:14b
  …
✓ qwen2.5-coder:14b ready

? Target domain
  > target.com

? Select scan profile
    1) Quick
  ▸ 2) Bug bounty (default)
    3) Pentest
    4) ASM continuous
    5) Stealth max

…

─── Scan summary ───
  Target           target.com
  Scan profile     bugbounty
  AI tier          lean
  AI auto-pull     yes
  AI verbose       no
  Live view        yes (v=1)

? Start scan? [Y/n]
  >
```

Force the wizard even when -d is set (to review defaults):

```bash
./god-eye --wizard -d target.com
```

---

## Auto-pull of missing models

When `--enable-ai` is on and `--ai-auto-pull` is true (default), God's Eye checks Ollama at startup and downloads missing models before the pipeline starts.

Under the hood:

1. **Reachability check** — `GET /api/tags`. If unreachable, AI modules silently no-op and the scan proceeds without AI.
2. **Inventory compare** — matches installed models (by tag) against the profile's required set. Handles `:latest` suffix and tagless lookups.
3. **Stream pull** — `POST /api/pull` with `stream:true`, NDJSON progress parsed and throttled (new status or ≥5% jump triggers a log line).
4. **Ready** — returns control to the pipeline coordinator.

Disable auto-pull if you'd rather error out on missing models:

```bash
./god-eye -d target.com --pipeline --enable-ai --ai-auto-pull=false
```

When the wizard runs it asks explicitly before downloading. Non-wizard mode pulls silently unless `--ai-verbose` is set.

---

## Verbose mode

See every Ollama interaction in real time on stderr:

```bash
./god-eye -d target.com --pipeline --enable-ai --ai-verbose --live
```

Stderr output:

```
[ai] → qwen3:1.7b  prompt=2341B timeout=60s
[ai] ← qwen3:1.7b  response=512B  1.3s
[ai] → qwen2.5-coder:14b  prompt=8291B timeout=120s
[ai] ← qwen2.5-coder:14b  response=1832B  8.7s
[ai] → qwen2.5-coder:14b  prompt=5123B timeout=120s
[ai] ← qwen2.5-coder:14b  response=946B  5.2s
```

Useful for:
- Debugging slow runs (spot the 60s+ queries)
- Tuning the triage threshold (are "skip" decisions correct?)
- Verifying the cascade is actually running (triage fires before deep)
- Sanity-checking prompt sizes (large prompts = context-bloat → fix the caller)

Verbose goes to **stderr** so stdout JSON / silent modes still parse cleanly.

---

## Multi-agent orchestration

In addition to the cascade, God's Eye ships an 8-agent specialized system (inherited from v1). Enabled automatically in `bugbounty` and `pentest` profiles, or explicitly:

```bash
./god-eye -d target.com --pipeline --enable-ai --multi-agent
```

| Agent    | Specialty                                    |
|----------|----------------------------------------------|
| XSS      | Cross-Site Scripting (DOM, Reflected, Stored) |
| SQLi     | SQL Injection (error, blind, time-based)     |
| Auth     | Auth bypass, IDOR, JWT, OAuth, SAML, session |
| API      | REST/GraphQL, CORS, rate limiting            |
| Crypto   | TLS / cipher issues, weak keys               |
| Secrets  | API keys, tokens, hardcoded credentials      |
| Headers  | CSP, HSTS, cookie flags, SameSite            |
| General  | Fallback for unclassified findings           |

How it works:

1. A **coordinator** agent classifies each raw finding (regex + short LLM call)
2. Routes it to the appropriate specialist
3. Specialist analyzes with domain-specific knowledge + OWASP-aligned remediation templates
4. Emits an `AIFinding` event with confidence score

This is a v1-era implementation. **Fase 3 (in progress)** introduces native Planner/Worker agents with tool calls — see `internal/agent/` for the evolving interfaces.

---

## CVE matching

Two-layer CVE detection:

1. **Offline KEV (CISA Known Exploited Vulnerabilities)** — ~1400 actively exploited CVEs, auto-downloaded to `~/.god-eye/kev.json` on first AI-enabled scan. Instant lookups, no network.
2. **NVD API (fallback)** — full CVE database, queried via function-calling from the deep model when the detected tech+version doesn't match KEV.

Update the KEV cache manually any time:

```bash
./god-eye update-db
./god-eye db-info
```

CVE matches emit an `eventbus.CVEMatch` event with the tech, version, severity, and KEV flag:

```
CRIT  CVE  nginx@1.18.0 → CVE-2021-23017
```

Integration with your output:

```json
{
  "host": "nginx-internal.target.com",
  "technologies": ["nginx/1.18.0"],
  "cve_findings": ["CVE-2021-23017"]
}
```

---

## Custom models + YAML config

Override the profile's choices per-scan:

```bash
./god-eye -d target.com --pipeline --enable-ai \
  --ai-fast-model qwen3:4b \
  --ai-deep-model qwen3-coder:30b
```

Or persist in YAML:

```yaml
# god-eye.yaml
profile: bugbounty

ai:
  enabled: true
  url: http://localhost:11434       # point at a remote Ollama if you have one
  fast_model: qwen3:4b               # triage
  deep_model: qwen3-coder:30b        # deep analysis (MoE)
  cascade: true
  deep: true                         # run deep on every finding, not just triaged ones
  multi_agent: true
```

The wizard writes these when you pick a non-default profile through it (future enhancement; right now you edit YAML by hand).

---

## Troubleshooting

### "ollama not reachable at http://localhost:11434"

```bash
# Check the server is up
curl http://localhost:11434/api/tags

# If the port isn't listening
ollama serve &
```

If it's listening on a different host/port (e.g., remote machine):

```bash
./god-eye -d target.com --pipeline --enable-ai --ai-url http://10.0.0.10:11434
```

### "pull qwen3:1.7b: model not found"

Ollama can't resolve the tag. Make sure you're on an up-to-date Ollama — the registry changes names occasionally. Try:

```bash
ollama pull qwen3:1.7b
ollama list
```

If the pull works manually but god-eye fails, file an issue.

### Downloads hang at some percentage

Usually network-flakiness with the Ollama registry. Ollama resumes; kill god-eye with Ctrl-C and retry — it will pick up where the manifest/blob left off.

### AI findings feel too hallucinated

Three levers:

1. Drop the temperature. Edit `internal/ai/ollama.go:query()` (`temperature: 0.3` → `0.1`).
2. Use a bigger triage model (`--ai-profile heavy`).
3. Disable the cascade (`--ai-cascade=false`) so every finding gets the deep model — slower but higher quality floor.

### "deep model has low tok/sec on my MacBook Pro"

Expected for dense 14B. Switch to balanced profile: the MoE 30B is **faster** than dense 14B because only 3.3B params activate per token.

```bash
./god-eye --ai-profile balanced …
```

### High memory usage

Both models are loaded in Ollama when the scan starts. Options:

- Use the lean profile.
- Drop the deep model to `qwen2.5-coder:7b` (less capable but only ~5GB).
- Disable the cascade and use only the fast model: `--ai-cascade=false --ai-deep-model qwen3:1.7b`.

---

## Privacy & security model

✅ **Completely local** — Ollama runs on your machine; no data leaves it.
✅ **Offline after pull** — once models are cached in `~/.ollama/`, no network is required.
✅ **Open-source infrastructure** — Ollama (MIT), models under their respective open licenses.
✅ **No telemetry** — God's Eye doesn't phone home.
✅ **Free forever** — no API keys, no usage caps.

**What the AI layer sees**: excerpts of HTTP responses, JS file content, technology banners, and your target domain. Do NOT enable AI if your engagement terms forbid third-party tooling touching response bodies — even though the LLM is local, some agreements treat automated analysis separately.

---

## Performance reference

Measured on an Apple M1 Pro, 16GB RAM, `ollama serve` running alongside the scan.

### Lean cascade

| Finding type         | Triage latency | Deep latency | Total |
|----------------------|----------------|--------------|-------|
| Short HTTP response  | 0.6s           | 4.1s         | 4.7s  |
| Medium JS file (8KB) | 0.9s           | 9.3s         | 10.2s |
| Large JS bundle (64KB, truncated) | 1.1s | 14.2s     | 15.3s |

### Balanced cascade (MoE)

| Finding type         | Triage | Deep   | Total  |
|----------------------|--------|--------|--------|
| Short HTTP response  | 0.8s   | 3.2s   | 4.0s   |
| Medium JS (8KB)      | 1.2s   | 7.1s   | 8.3s   |
| Large JS (64KB)      | 1.5s   | 10.8s  | 12.3s  |

Net effect: balanced is ~20% faster on deep analysis despite producing higher-quality findings, thanks to the MoE architecture activating only 3.3B parameters per token.

### Scan-level benchmarks

See [BENCHMARK.md](BENCHMARK.md) for end-to-end scan times across profiles and target sizes.

---

## Reference — every AI-related flag

| Flag                  | Default                | Description                                           |
|-----------------------|------------------------|-------------------------------------------------------|
| `--enable-ai`         | `false`                | Turn on the AI layer                                  |
| `--ai-profile`        | `""` (uses individual flags) | Preset tier: `lean`/`balanced`/`heavy`          |
| `--ai-url`            | `http://localhost:11434` | Ollama API URL                                      |
| `--ai-fast-model`     | `qwen3:1.7b`           | Triage model (Ollama tag)                             |
| `--ai-deep-model`     | `qwen2.5-coder:14b`    | Deep-analysis model (Ollama tag)                      |
| `--ai-cascade`        | `true`                 | Use fast → deep cascade                               |
| `--ai-deep`           | `false`                | Run deep on every finding, skipping triage filter     |
| `--multi-agent`       | `false`                | Enable 8-agent specialized orchestration              |
| `--ai-verbose`        | `false`                | Log every Ollama query on stderr                      |
| `--ai-auto-pull`      | `true`                 | Download missing models at startup                    |

Every flag has a matching YAML key in `config.yaml` under `ai:`.