feat: add Windows NRPT and WFP DNS interception

This commit is contained in:
Codescribe
2026-03-05 04:50:16 -05:00
committed by Cuong Manh Le
parent 3442331695
commit b9fb3b9176
3 changed files with 2266 additions and 0 deletions

File diff suppressed because it is too large Load Diff

449
docs/wfp-dns-intercept.md Normal file
View File

@@ -0,0 +1,449 @@
# Windows DNS Intercept — Technical Reference
## Overview
On Windows, DNS intercept mode uses a two-layer architecture:
- **`dns` mode (default)**: NRPT only — graceful DNS routing via the Windows DNS Client service
- **`hard` mode**: NRPT + WFP — full enforcement with kernel-level block filters
This dual-mode design ensures that `dns` mode can never break DNS (at worst, a VPN
overwrites NRPT and queries bypass ctrld temporarily), while `hard` mode provides
the same enforcement guarantees as macOS pf.
## Architecture: dns vs hard Mode
```
┌─────────────────────────────────────────────────────────────────┐
│ dns mode (NRPT only) │
│ │
│ App DNS query → DNS Client service → NRPT lookup │
│ → "." catch-all matches → forward to 127.0.0.1 (ctrld) │
│ │
│ If VPN clears NRPT: health monitor re-adds within 30s │
│ Worst case: queries go to VPN DNS until NRPT restored │
│ DNS never breaks — graceful degradation │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ hard mode (NRPT + WFP) │
│ │
│ App DNS query → DNS Client service → NRPT → 127.0.0.1 (ctrld)│
│ │
│ Bypass attempt (raw 8.8.8.8:53) → WFP BLOCK filter │
│ VPN DNS on private IP → WFP subnet PERMIT filter → allowed │
│ │
│ NRPT must be active before WFP starts (atomic guarantee) │
│ If NRPT fails → WFP not started (avoids DNS blackhole) │
│ If WFP fails → NRPT rolled back (all-or-nothing) │
└─────────────────────────────────────────────────────────────────┘
```
## NRPT (Name Resolution Policy Table)
### What It Does
NRPT is a Windows feature (originally for DirectAccess) that tells the DNS Client
service to route queries matching specific namespace patterns to specific DNS servers.
ctrld adds a catch-all rule that routes ALL DNS to `127.0.0.1`:
| Registry Value | Type | Value | Purpose |
|---|---|---|---|
| `Name` | REG_MULTI_SZ | `.` | Namespace (`.` = catch-all) |
| `GenericDNSServers` | REG_SZ | `127.0.0.1` | Target DNS server |
| `ConfigOptions` | REG_DWORD | `0x8` | Standard DNS resolution |
| `Version` | REG_DWORD | `0x2` | NRPT rule version 2 |
| `Comment` | REG_SZ | `` | Empty (matches PowerShell behavior) |
| `DisplayName` | REG_SZ | `` | Empty (matches PowerShell behavior) |
| `IPSECCARestriction` | REG_SZ | `` | Empty (matches PowerShell behavior) |
### Registry Paths — GP vs Local (Critical)
Windows NRPT has two registry paths with **all-or-nothing** precedence:
| Path | Name | Mode |
|---|---|---|
| `HKLM\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig` | **GP path** | Group Policy mode |
| `HKLM\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig` | **Local path** | Local/service store mode |
**Precedence rule**: If ANY rules exist in the GP path (from IT policy, VPN, MDM,
or our own earlier builds), DNS Client enters "GP mode" and **ignores ALL local-path
rules entirely**. This is not per-rule — it's a binary switch.
**Consequence**: On non-domain-joined (WORKGROUP) machines, `RefreshPolicyEx` is
unreliable. If we write to the GP path, DNS Client enters GP mode but the rules
never activate — resulting in `Get-DnsClientNrptPolicy` returning empty even though
`Get-DnsClientNrptRule` shows the rule in registry.
ctrld uses an adaptive strategy (matching [Tailscale's approach](https://github.com/tailscale/tailscale/blob/main/net/dns/nrpt_windows.go)):
1. **Always write to the local path** using a deterministic GUID key name
(`{B2E9A3C1-7F4D-4A8E-9D6B-5C1E0F3A2B8D}`). This is the baseline that works
on all non-domain machines.
2. **Check if other software has GP NRPT rules** (`otherGPRulesExist()`). If
foreign GP rules are present (IT policy, VPN), DNS Client is already in GP mode
and our local rule would be invisible — so we also write to the GP path.
3. **If no foreign GP rules exist**, clean any stale ctrld GP rules and delete
the empty GP parent key. This ensures DNS Client stays in "local mode" where
the local-path rule activates immediately via `paramchange`.
### VPN Coexistence
NRPT uses most-specific-match. VPN NRPT rules for specific domains (e.g.,
`*.corp.local` → `10.20.30.1`) take priority over ctrld's `.` catch-all.
This means VPN split DNS works naturally — VPN-specific domains go to VPN DNS,
everything else goes to ctrld. No exemptions or special handling needed.
### DNS Client Notification
After writing NRPT rules, DNS Client must be notified to reload:
1. **`paramchange`**: `sc control dnscache paramchange` — signals DNS Client to
re-read configuration. Works for local-path rules on most machines.
2. **`RefreshPolicyEx`**: `RefreshPolicyEx(bMachine=TRUE, dwOptions=RP_FORCE)` from
`userenv.dll` — triggers GP refresh for GP-path rules. Unreliable on non-domain
machines (WORKGROUP). Fallback: `gpupdate /target:computer /force`.
3. **DNS cache flush**: `DnsFlushResolverCache` from `dnsapi.dll` or `ipconfig /flushdns`
— clears stale cached results from before NRPT was active.
### DNS Cache Flush
After NRPT changes, stale DNS cache entries could bypass the new routing. ctrld flushes:
1. **Primary**: `DnsFlushResolverCache` from `dnsapi.dll`
2. **Fallback**: `ipconfig /flushdns` (subprocess)
### Known Limitation: nslookup
`nslookup.exe` implements its own DNS resolver and does NOT use the Windows DNS Client
service. It ignores NRPT entirely. Use `Resolve-DnsName` (PowerShell) or `ping` to
verify DNS resolution through NRPT. This is a well-known Windows behavior.
## WFP (Windows Filtering Platform) — hard Mode Only
### Filter Stack
```
┌─────────────────────────────────────────────────────────────────┐
│ Sublayer: "ctrld DNS Intercept" (weight 0xFFFF — max priority) │
│ │
│ ┌─ Permit Filters (weight 10) ─────────────────────────────┐ │
│ │ • IPv4/UDP to 127.0.0.1:53 → PERMIT │ │
│ │ • IPv4/TCP to 127.0.0.1:53 → PERMIT │ │
│ │ • IPv6/UDP to ::1:53 → PERMIT │ │
│ │ • IPv6/TCP to ::1:53 → PERMIT │ │
│ │ • RFC1918 + CGNAT subnets:53 → PERMIT (VPN DNS) │ │
│ │ • VPN DNS exemptions (dynamic) → PERMIT │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌─ Block Filters (weight 1) ───────────────────────────────┐ │
│ │ • All IPv4/UDP to *:53 → BLOCK │ │
│ │ • All IPv4/TCP to *:53 → BLOCK │ │
│ │ • All IPv6/UDP to *:53 → BLOCK │ │
│ │ • All IPv6/TCP to *:53 → BLOCK │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ Filter evaluation: higher weight wins → permits checked first │
└─────────────────────────────────────────────────────────────────┘
```
### Why WFP Can't Work Alone
WFP operates at the connection authorization layer (`FWPM_LAYER_ALE_AUTH_CONNECT`).
It can only **block** or **permit** connections — it **cannot redirect** them.
Redirection requires kernel-mode callout drivers (`FwpsCalloutRegister` in
`fwpkclnt.lib`) using `FWPM_LAYER_ALE_CONNECT_REDIRECT_V4/V6`, which are not
accessible from userspace.
Without NRPT, WFP blocks outbound DNS but doesn't tell applications where to send
queries instead — they just see DNS failures. This is why `hard` mode requires NRPT
to be active first, and why WFP is rolled back if NRPT setup fails.
### Sublayer Priority
Weight `0xFFFF` (maximum) ensures ctrld's filters take priority over any other WFP
sublayers from VPN software, endpoint security, or Windows Defender Firewall.
### RFC1918 + CGNAT Subnet Permits
Static permit filters for private IP ranges (10.0.0.0/8, 172.16.0.0/12,
192.168.0.0/16, 100.64.0.0/10) allow VPN DNS servers on private IPs to work
without dynamic per-server exemptions. This covers Tailscale MagicDNS
(100.100.100.100), corporate VPN DNS (10.x.x.x), and similar.
### VPN DNS Exemption Updates
When `vpnDNSManager.Refresh()` discovers VPN DNS servers on public IPs:
1. Delete all existing VPN permit filters (by stored IDs)
2. For each VPN DNS server IP:
- IPv4: `addWFPPermitIPFilter()` on `ALE_AUTH_CONNECT_V4`
- IPv6: `addWFPPermitIPv6Filter()` on `ALE_AUTH_CONNECT_V6`
- Both UDP and TCP for each IP
3. Store new filter IDs for next cleanup cycle
**In `dns` mode, VPN DNS exemptions are skipped** — there are no WFP block
filters to exempt from.
### Session Lifecycle
**Startup (hard mode):**
```
1. Add NRPT catch-all rule + GP refresh + DNS flush
2. FwpmEngineOpen0() with RPC_C_AUTHN_DEFAULT (0xFFFFFFFF)
3. Delete stale sublayer (crash recovery)
4. FwpmSubLayerAdd0() — weight 0xFFFF
5. Add 4 localhost permit filters
6. Add 4 block filters
7. Add RFC1918 + CGNAT subnet permits
8. Start NRPT health monitor goroutine
```
**Startup (dns mode):**
```
1. Add NRPT catch-all rule + GP refresh + DNS flush
2. Start NRPT health monitor goroutine
3. (No WFP — done)
```
**Shutdown:**
```
1. Stop NRPT health monitor
2. Remove NRPT catch-all rule + DNS flush
3. (hard mode only) Clean up all WFP filters, sublayer, close engine
```
**Crash Recovery:**
On startup, `FwpmSubLayerDeleteByKey0` removes any stale sublayer from a previous
unclean shutdown, including all its child filters (deterministic GUID ensures we
only clean up our own).
## NRPT Probe and Auto-Heal
### The Problem: Async GP Refresh Race
`RefreshPolicyEx` triggers a Group Policy refresh but returns immediately — it does
NOT wait for the DNS Client service to actually reload NRPT from the registry. On
cold machines (first boot, fresh install, long sleep), the DNS Client may take
several seconds to process the policy refresh. During this window, NRPT rules exist
in the registry but the DNS Client hasn't loaded them — queries bypass ctrld.
### The Solution: Active Probing
After writing NRPT to the registry, ctrld sends a probe DNS query through the
Windows DNS Client path to verify NRPT is actually working:
1. Generate a unique probe domain: `_nrpt-probe-<hex>.nrpt-probe.ctrld.test`
2. Send it via Go's `net.Resolver` (calls `GetAddrInfoW` → DNS Client → NRPT)
3. If NRPT is active, DNS Client routes it to 127.0.0.1 → ctrld receives it
4. ctrld's DNS handler recognizes the probe prefix and signals success
5. If the probe times out (2s), NRPT isn't loaded yet → retry with remediation
### Startup Probe (Async)
After NRPT setup, an async goroutine runs the probe-and-heal sequence without
blocking startup:
```
Probe attempt 1 (2s timeout)
├─ Success → "NRPT verified working", done
└─ Timeout → GP refresh + DNS flush, sleep 1s
Probe attempt 2 (2s timeout)
├─ Success → done
└─ Timeout → Restart DNS Client service (nuclear), sleep 2s
Re-add NRPT + GP refresh + DNS flush
Probe attempt 3 (2s timeout)
├─ Success → done
└─ Timeout → GP refresh + DNS flush, sleep 4s
Probe attempt 4 (2s timeout)
├─ Success → done
└─ Timeout → log error, continue
```
### DNS Client Restart (Nuclear Option)
If GP refresh alone isn't enough, ctrld restarts the Windows DNS Client service
(`Dnscache`). This forces the DNS Client to fully re-initialize, including
re-reading all NRPT rules from the registry. This is the equivalent of macOS
`forceReloadPFMainRuleset()`.
**Trade-offs:**
- Briefly interrupts ALL DNS resolution (few hundred ms during restart)
- Clears the system DNS cache (all apps need to re-resolve)
- VPN NRPT rules survive (they're in registry, re-read on restart)
- Enterprise security tools may log the service restart event
This only fires as attempt #3 after two GP refresh attempts fail — at that point
DNS isn't working through ctrld anyway, so a brief DNS blip is acceptable.
### Health Monitor Integration
The 30s periodic health monitor now does actual probing, not just registry checks:
```
Every 30s:
├─ Registry check: nrptCatchAllRuleExists()?
│ ├─ Missing → re-add + GP refresh + flush + probe-and-heal
│ └─ Present → probe to verify it's actually routing
│ ├─ Probe success → OK
│ └─ Probe failure → probe-and-heal cycle
└─ (hard mode only) Check: wfpSublayerExists()?
├─ Missing → full restart (stopDNSIntercept + startDNSIntercept)
└─ Present → OK
```
**Singleton guard:** Only one probe-and-heal sequence runs at a time (atomic bool).
The startup probe and health monitor cannot overlap.
**Why periodic, not just network-event?** VPN software or Group Policy updates can
clear NRPT at any time, not just during network changes. A 30s periodic check ensures
recovery within a bounded window.
**Hard mode safety:** The health monitor verifies NRPT before checking WFP. If NRPT
is gone, it's restored first. WFP is never running without NRPT — this prevents
DNS blackholes where WFP blocks everything but NRPT isn't routing to ctrld.
## DNS Flow Diagrams
### Normal Resolution (both modes)
```
App → DNS Client → NRPT lookup → "." matches → 127.0.0.1 → ctrld
→ Control D DoH (port 443, not affected by WFP port-53 rules)
→ response flows back
```
### VPN Split DNS (both modes)
```
App → DNS Client → NRPT lookup:
VPN domain (*.corp.local) → VPN's NRPT rule wins → VPN DNS server
Everything else → ctrld's "." catch-all → 127.0.0.1 → ctrld
→ VPN domain match → forward to VPN DNS (port 53)
→ (hard mode: WFP subnet permit allows private IP DNS)
```
### Bypass Attempt (hard mode only)
```
App → raw socket to 8.8.8.8:53 → WFP ALE_AUTH_CONNECT → BLOCK
```
In `dns` mode, this query would succeed (no WFP) — the tradeoff for never
breaking DNS.
## Key Differences from macOS (pf)
| Aspect | macOS (pf) | Windows dns mode | Windows hard mode |
|--------|-----------|------------------|-------------------|
| **Routing** | `rdr` redirect | NRPT policy | NRPT policy |
| **Enforcement** | `route-to` + block rules | None (graceful) | WFP block filters |
| **Can break DNS?** | Yes (pf corruption) | No | Yes (if NRPT lost) |
| **VPN coexistence** | Watchdog + stabilization | NRPT most-specific-match | Same + WFP permits |
| **Bypass protection** | pf catches all packets | None | WFP catches all connections |
| **Recovery** | Probe + auto-heal | Health monitor re-adds | Full restart on sublayer loss |
## WFP API Notes
### Struct Layouts
WFP C API structures are manually defined in Go (`golang.org/x/sys/windows` doesn't
include WFP types). Field alignment must match the C ABI exactly — any mismatch
causes access violations or silent corruption.
### FWP_DATA_TYPE Enum
```
FWP_EMPTY = 0
FWP_UINT8 = 1
FWP_UINT16 = 2
FWP_UINT32 = 3
FWP_UINT64 = 4
...
```
**⚠️** Some documentation examples incorrectly start at 1. The enum starts at 0
(`FWP_EMPTY`), making all subsequent values offset by 1 from what you might expect.
### GC Safety
When passing Go heap objects to WFP syscalls via `unsafe.Pointer`, use
`runtime.KeepAlive()` to prevent garbage collection during the call:
```go
conditions := make([]fwpmFilterCondition0, 3)
filter.filterCondition = &conditions[0]
r1, _, _ := procFwpmFilterAdd0.Call(...)
runtime.KeepAlive(conditions)
```
### Authentication
`FwpmEngineOpen0` requires `RPC_C_AUTHN_DEFAULT` (0xFFFFFFFF) for the authentication
service parameter. `RPC_C_AUTHN_NONE` (0) returns `ERROR_NOT_SUPPORTED` on some
configurations (e.g., Parallels VMs).
### Elevation
WFP requires admin/SYSTEM privileges. `FwpmEngineOpen0` fails with HRESULT 0x32
when run non-elevated. Services running as SYSTEM have this automatically.
## Debugging
### Check NRPT Rules
```powershell
# PowerShell — show active NRPT rules
Get-DnsClientNrptRule
# Check registry directly
Get-ChildItem "HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig"
```
### Check WFP Filters (hard mode)
```powershell
# Show all WFP filters (requires admin) — output is XML
netsh wfp show filters
# Search for ctrld's filters
Select-String "ctrld" filters.xml
```
### Verify DNS Resolution
```powershell
# Use Resolve-DnsName, NOT nslookup (nslookup bypasses NRPT)
Resolve-DnsName example.com
ping example.com
# If you must use nslookup, specify localhost:
nslookup example.com 127.0.0.1
# Force GP refresh (if NRPT not loading)
gpupdate /target:computer /force
# Verify service registration
sc qc ctrld
```
### Service Verification
After install, verify the Windows service is correctly registered:
```powershell
# Check binary path and start type
sc qc ctrld
# Should show:
# BINARY_PATH_NAME: "C:\...\ctrld.exe" run --cd xxxxx --intercept-mode dns
# START_TYPE: AUTO_START
```
## Related
- [DNS Intercept Mode Overview](dns-intercept-mode.md) — cross-platform documentation
- [pf DNS Intercept](pf-dns-intercept.md) — macOS technical reference
- [Microsoft WFP Documentation](https://docs.microsoft.com/en-us/windows/win32/fwp/windows-filtering-platform-start-page)
- [Microsoft NRPT Documentation](https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/dn593632(v=ws.11))

132
scripts/nrpt-diag.ps1 Normal file
View File

@@ -0,0 +1,132 @@
#Requires -RunAsAdministrator
<#
.SYNOPSIS
NRPT diagnostic script for ctrld DNS intercept troubleshooting.
.DESCRIPTION
Captures the full NRPT state: registry keys (both GP and direct paths),
effective policy, active rules, DNS Client service status, and resolver
config. Run as Administrator.
.EXAMPLE
.\nrpt-diag.ps1
.\nrpt-diag.ps1 | Out-File nrpt-diag-output.txt
#>
$ErrorActionPreference = 'SilentlyContinue'
Write-Host "=== NRPT Diagnostic Report ===" -ForegroundColor Cyan
Write-Host "Date: $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')"
Write-Host "Computer: $env:COMPUTERNAME"
Write-Host "OS: $((Get-CimInstance Win32_OperatingSystem).Caption) $((Get-CimInstance Win32_OperatingSystem).BuildNumber)"
Write-Host ""
# --- 1. DNS Client Service ---
Write-Host "=== 1. DNS Client (Dnscache) Service ===" -ForegroundColor Yellow
$svc = Get-Service Dnscache
Write-Host "Status: $($svc.Status) StartType: $($svc.StartType)"
Write-Host ""
# --- 2. GP Path (Policy store) ---
$gpPath = "HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\DNSClient\DnsPolicyConfig"
Write-Host "=== 2. GP Path: $gpPath ===" -ForegroundColor Yellow
$gpKey = Get-Item $gpPath 2>$null
if ($gpKey) {
Write-Host "Key EXISTS"
$subkeys = Get-ChildItem $gpPath 2>$null
if ($subkeys) {
foreach ($sk in $subkeys) {
Write-Host ""
Write-Host " Subkey: $($sk.PSChildName)" -ForegroundColor Green
foreach ($prop in $sk.Property) {
$val = $sk.GetValue($prop)
$kind = $sk.GetValueKind($prop)
Write-Host " $prop ($kind) = $val"
}
}
} else {
Write-Host " ** EMPTY (no subkeys) — this blocks NRPT activation! **" -ForegroundColor Red
}
} else {
Write-Host "Key does NOT exist (clean state)"
}
Write-Host ""
# --- 3. Direct Path (Service store) ---
$directPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\DnsPolicyConfig"
Write-Host "=== 3. Direct Path: $directPath ===" -ForegroundColor Yellow
$directKey = Get-Item $directPath 2>$null
if ($directKey) {
Write-Host "Key EXISTS"
$subkeys = Get-ChildItem $directPath 2>$null
if ($subkeys) {
foreach ($sk in $subkeys) {
Write-Host ""
Write-Host " Subkey: $($sk.PSChildName)" -ForegroundColor Green
foreach ($prop in $sk.Property) {
$val = $sk.GetValue($prop)
$kind = $sk.GetValueKind($prop)
Write-Host " $prop ($kind) = $val"
}
}
} else {
Write-Host " ** EMPTY (no subkeys) **" -ForegroundColor Red
}
} else {
Write-Host "Key does NOT exist"
}
Write-Host ""
# --- 4. Effective NRPT Rules (what Windows sees) ---
Write-Host "=== 4. Get-DnsClientNrptRule ===" -ForegroundColor Yellow
$rules = Get-DnsClientNrptRule 2>$null
if ($rules) {
$rules | Format-List Name, Version, Namespace, NameServers, NameEncoding, DnsSecEnabled
} else {
Write-Host "(none)"
}
Write-Host ""
# --- 5. Effective NRPT Policy (what DNS Client actually applies) ---
Write-Host "=== 5. Get-DnsClientNrptPolicy ===" -ForegroundColor Yellow
$policy = Get-DnsClientNrptPolicy 2>$null
if ($policy) {
$policy | Format-List Namespace, NameServers, NameEncoding, QueryPolicy
} else {
Write-Host "(none — DNS Client is NOT honoring any NRPT rules)" -ForegroundColor Red
}
Write-Host ""
# --- 6. Interface DNS servers ---
Write-Host "=== 6. Interface DNS Configuration ===" -ForegroundColor Yellow
Get-DnsClientServerAddress -AddressFamily IPv4 | Where-Object { $_.ServerAddresses } |
Format-Table InterfaceAlias, InterfaceIndex, ServerAddresses -AutoSize
Write-Host ""
# --- 7. DNS resolution test ---
Write-Host "=== 7. DNS Resolution Test ===" -ForegroundColor Yellow
Write-Host "Resolve-DnsName example.com (uses DNS Client / NRPT):"
try {
$result = Resolve-DnsName example.com -Type A -DnsOnly -ErrorAction Stop
$result | Format-Table Name, Type, IPAddress -AutoSize
} catch {
Write-Host " FAILED: $_" -ForegroundColor Red
}
Write-Host ""
Write-Host "nslookup example.com 127.0.0.1 (direct to ctrld, bypasses NRPT):"
$ns = nslookup example.com 127.0.0.1 2>&1
$ns | ForEach-Object { Write-Host " $_" }
Write-Host ""
# --- 8. Domain join status ---
Write-Host "=== 8. Domain Status ===" -ForegroundColor Yellow
$cs = Get-CimInstance Win32_ComputerSystem
Write-Host "Domain: $($cs.Domain) PartOfDomain: $($cs.PartOfDomain)"
Write-Host ""
# --- 9. Group Policy NRPT ---
Write-Host "=== 9. GP Result (NRPT section) ===" -ForegroundColor Yellow
Write-Host "(Running gpresult — may take a few seconds...)"
$gp = gpresult /r 2>&1
$gp | Select-String -Pattern "DNS|NRPT|Policy" | ForEach-Object { Write-Host " $_" }
Write-Host ""
Write-Host "=== End of Diagnostic Report ===" -ForegroundColor Cyan