fix: use raw IPv6 socket for DNS responses in macOS intercept mode

macOS rejects sendmsg from [::1] to global unicast IPv6 (EINVAL), and
nat on lo0 doesn't fire for route-to'd packets (pf skips translation
on the second interface pass). ULA addresses on lo0 also fail (EHOSTUNREACH
- kernel segregates lo0 routing).

Solution: wrap the [::1] UDP listener's ResponseWriter with rawIPv6Writer
that sends responses via SOCK_RAW (IPPROTO_UDP) on lo0, bypassing the
kernel's routing validation. pf's rdr state reverses the address
translation on the response path.

Changes:
- Add rawipv6_darwin.go: rawIPv6Writer wraps dns.ResponseWriter, sends
  UDP responses via raw IPv6 socket with proper checksum calculation
- Add rawipv6_other.go: no-op wrapIPv6Handler for non-darwin platforms
- Remove nat rules from pf anchor (no longer needed)
- Block IPv6 TCP DNS (block return) - falls back to IPv4 (~1s, rare)
- Remove IPv6 TCP rdr/route-to/pass rules (only UDP intercepted)
This commit is contained in:
Codescribe
2026-03-30 13:55:52 -04:00
committed by Cuong Manh Le
parent 95dd871e2d
commit 22a796f673
5 changed files with 227 additions and 78 deletions
+30 -28
View File
@@ -17,7 +17,7 @@ options (set) → normalization (scrub) → queueing → translation (nat/rdr)
| Anchor Type | Section | Purpose |
|-------------|---------|---------|
| `scrub-anchor` | Normalization | Packet normalization |
| `nat-anchor` | Translation | NAT rules |
| `nat-anchor` | Translation | NAT rules (not used by ctrld) |
| `rdr-anchor` | Translation | Redirect rules |
| `anchor` | Filtering | Pass/block rules |
@@ -122,57 +122,60 @@ Three problems prevent a simple "mirror the IPv4 rules" approach:
3. **sendmsg from `[::1]` to global unicast fails**: Unlike IPv4 where the kernel allows `sendmsg` from `127.0.0.1` to local private IPs (e.g., `10.x.x.x`), macOS/BSD rejects `sendmsg` from `[::1]` to a global unicast IPv6 address with `EINVAL`. Since pf's `rdr` preserves the original source IP (the machine's global IPv6 address), ctrld's reply would fail.
### Solution: nat + rdr + [::1] Listener
### Solution: Raw Socket Response + rdr + [::1] Listener
**Key insight:** pf's `nat on lo0` doesn't fire for `route-to`'d packets (pf already ran the translation phase on the original outbound interface and skips it on lo0's outbound pass). `rdr` works because it fires on lo0's *inbound* side (a new direction after loopback reflection). So we can't use `nat` to rewrite the source, and any address bound to lo0 (including ULAs like `fd00:53::1`) can't send to global unicast addresses — the kernel segregates lo0's routing.
Instead, we use a **raw IPv6 socket** to send UDP responses. The `[::1]` listener receives queries normally via `rdr`, but responses are sent via `SOCK_RAW` with `IPPROTO_UDP`, bypassing the kernel's routing validation. The raw socket constructs the UDP packet (header + DNS payload) with correct checksums and sends it on lo0. pf matches the response against the `rdr` state table and reverse-translates the addresses.
**IPv6 TCP DNS** is blocked (`block return`) and falls back to IPv4 — TCP DNS is rare (truncated responses, zone transfers) and raw socket injection for TCP would require managing the full TCP state machine.
```
# NAT: rewrite source to ::1 so ctrld can reply
nat on lo0 inet6 proto udp from ! ::1 to ! ::1 port 53 -> ::1
nat on lo0 inet6 proto tcp from ! ::1 to ! ::1 port 53 -> ::1
# RDR: redirect destination to ctrld's IPv6 listener
# RDR: redirect IPv6 UDP DNS to ctrld's listener (no nat needed)
rdr on lo0 inet6 proto udp from any to ! ::1 port 53 -> ::1 port 53
rdr on lo0 inet6 proto tcp from any to ! ::1 port 53 -> ::1 port 53
# Filter: route-to forces IPv6 DNS to loopback (mirrors IPv4 rules)
# Filter: route-to forces IPv6 UDP DNS to loopback
pass out quick on ! lo0 route-to lo0 inet6 proto udp from any to ! ::1 port 53
pass out quick on ! lo0 route-to lo0 inet6 proto tcp from any to ! ::1 port 53
# Block IPv6 TCP DNS — raw socket can't handle TCP; apps fall back to IPv4
block return out quick on ! lo0 inet6 proto tcp from any to ! ::1 port 53
# Pass on lo0 without state (mirrors IPv4)
pass out quick on lo0 inet6 proto udp from any to ! ::1 port 53 no state
pass out quick on lo0 inet6 proto tcp from any to ! ::1 port 53 no state
# Accept redirected IPv6 DNS with reply-to (mirrors IPv4)
pass in quick on lo0 reply-to lo0 inet6 proto { udp, tcp } from any to ::1 port 53
pass in quick on lo0 reply-to lo0 inet6 proto udp from any to ::1 port 53
```
### IPv6 Packet Flow
### IPv6 Packet Flow (UDP)
```
Application queries [2607:f0c8:8000:8210::1]:53 (IPv6 DNS server)
pf filter: "pass out route-to lo0 inet6 ... port 53" → redirects to lo0
pf filter: "pass out route-to lo0 inet6 proto udp ... port 53" → redirects to lo0
pf (outbound lo0): "pass out on lo0 inet6 ... no state" → passes
Loopback reflects packet inbound on lo0
pf nat: rewrites source 2607:f0c8:...:ec6e → ::1
pf rdr: rewrites dest [2607:f0c8:8000:8210::1]:53 → [::1]:53
(source remains: 2607:f0c8:...:ec6e — the machine's global IPv6)
ctrld receives query from [::1]:port → [::1]:53
ctrld receives query from [2607:f0c8:...:ec6e]:port → [::1]:53
ctrld resolves via DoH, replies to [::1]:port (kernel accepts ::1 → ::1)
ctrld resolves via DoH upstream
pf reverses both translations:
- nat reverse: dest ::1 → 2607:f0c8:...:ec6e (original client)
- rdr reverse: src ::1 → 2607:f0c8:8000:8210::1 (original DNS server)
Raw IPv6 socket sends response: [::1]:53 → [2607:f0c8:...:ec6e]:port
(bypasses kernel routing validation — raw socket on lo0)
pf reverses rdr: src [::1]:53 → [2607:f0c8:8000:8210::1]:53
Application receives response from [2607:f0c8:8000:8210::1]:53 ✓
```
### Client IP Recovery
The `nat` rewrites the source to `::1`, so ctrld sees the client as `::1` (loopback). The existing `spoofLoopbackIpInClientInfo()` logic detects this and replaces it with the machine's real RFC1918 IPv4 address (e.g., `10.0.10.211`). This is the same mechanism used when queries arrive from `127.0.0.1` — no client identity is lost.
pf's `rdr` preserves the original source (machine's global IPv6), so ctrld sees the real address. The existing `spoofLoopbackIpInClientInfo()` logic replaces loopback IPs with the machine's real RFC1918 IPv4 address for `X-Cd-Ip` reporting. For IPv6 intercepted queries, the source is already the real address — no spoofing needed.
### IPv6 Listener
@@ -180,12 +183,10 @@ The `[::1]` listener reuses the existing infrastructure from Windows (where it w
- **Windows**: Always (if IPv6 is available)
- **macOS**: Only in intercept mode
On macOS, the UDP handler is wrapped with `rawIPv6Writer` which intercepts `WriteMsg`/`Write` calls and sends responses via a raw IPv6 socket on lo0 instead of the normal `sendmsg` path.
If the `[::1]` listener fails to bind, it logs a warning and continues — the IPv4 listener is primary.
### nat-anchor Requirement
The `nat` rules in our anchor require a `nat-anchor "com.controld.ctrld"` reference in the main pf ruleset, in addition to the existing `rdr-anchor` and `anchor` references. All pf management functions (inject, remove, verify, watchdog, force-reload) handle all three anchor types.
## Rule Ordering Within the Anchor
pf requires translation rules before filter rules, even within an anchor:
@@ -236,7 +237,7 @@ The trickiest part. macOS only processes anchors declared in the active pf rules
1. Read `/etc/pf.conf`
2. If our anchor reference already exists, reload as-is
3. Otherwise, inject `nat-anchor "com.controld.ctrld"` and `rdr-anchor "com.controld.ctrld"` in the translation section and `anchor "com.controld.ctrld"` in the filter section
3. Otherwise, inject `rdr-anchor "com.controld.ctrld"` in the translation section and `anchor "com.controld.ctrld"` in the filter section
4. Write to a **temp file** and load with `pfctl -f <tmpfile>`
5. **We never modify `/etc/pf.conf` on disk** — changes are runtime-only and don't survive reboot (ctrld re-injects on every start)
@@ -376,5 +377,6 @@ We chose `route-to + rdr` as the best balance of effectiveness and deployability
9. **`pass out quick` exemptions work with route-to** — they fire in the same phase (filter), so `quick` + rule ordering means exempted packets never hit the route-to rule
10. **pf cannot cross-AF redirect**`rdr on lo0 inet6 ... -> 127.0.0.1` is invalid. IPv6 DNS must be handled by an `[::1]` listener.
11. **`block return` doesn't work for IPv6 DNS** — BSD doesn't deliver ICMPv6 unreachable to unconnected UDP sockets (`sendto`). Apps timeout waiting for a response that never comes.
12. **sendmsg from `::1` to global unicast fails on macOS** — unlike IPv4 where `127.0.0.1` can send to any local address, `::1` cannot send to the machine's own global IPv6 address. `nat` on lo0 is required to rewrite the source.
13. **`nat-anchor` is separate from `rdr-anchor`** — pf requires both in the main ruleset for nat and rdr rules in an anchor to be evaluated. `rdr-anchor` alone does not cover nat rules.
12. **sendmsg from `::1` to global unicast fails on macOS** — unlike IPv4 where `127.0.0.1` can send to any local address, `::1` cannot send to the machine's own global IPv6 address. Solved with raw socket response injection (SOCK_RAW + IPPROTO_UDP on lo0).
13. **`nat on lo0` doesn't fire for `route-to`'d packets** — pf runs translation on the original outbound interface (en0), then skips it on lo0's outbound pass. `rdr` works because lo0 inbound is a genuinely new direction. Any lo0 address (including ULAs) can't route to global unicast — the kernel segregates lo0's routing table.
14. **Raw IPv6 sockets bypass routing validation**`SOCK_RAW` with `IPPROTO_UDP` can send from `::1` to global unicast on lo0, unlike normal `SOCK_DGRAM` sockets. The kernel doesn't apply the same routing checks for raw sockets.