Current code writes to a predictable path, which on systems without
`fs.protected_symlinks` (e.g. embedded routers) could allow a local
attacker with API compromise to perform symlink attacks.
Go's default is already TLS 1.2+ (since Go 1.18), but making this
explicit satisfies RFC 7858/9250 recommendations and makes the security
intent clear for auditors.
When WFP loopback protect is active, the upstream.os healthcheck will
always fail because an external WFP block filter is interfering with
plain DNS. This demotes those expected failures to debug level and
returns errOsHealthcheckSuppressed so the recovery loop treats them
as non-fatal, eliminating the log spam described in #526.
When third-party VPN software (e.g., OpenVPN) installs WFP block filters via
block-outside-dns, all DNS traffic to non-tunnel interfaces is blocked —
including DNS to 127.0.0.1 (ctrld's NRPT target). This breaks DNS mode
interception because the NRPT catch-all rule routes queries to loopback,
but WFP blocks the connection before it reaches ctrld's listener.
Fix: after exhausting all NRPT recovery attempts, activate a minimal WFP
session with "hard permit" filters (FWPM_FILTER_FLAG_CLEAR_ACTION_RIGHT)
for DNS to localhost in a max-priority sublayer (weight 0xFFFF). This
overrides the VPN's block for loopback DNS only, while preserving the
VPN's DNS leak protection for all other (non-loopback) DNS traffic.
The loopback protect is:
- Only activated when NRPT probes fail (not preemptively)
- Harmless when no conflicting WFP blocks exist (permit-only, no blocks)
- Persistent until ctrld shutdown (survives VPN reconnect cycles)
- Cleaned up by the existing cleanupWFPFilters path on shutdown
Add file-backed persistence to the internal logWriter so runtime logs
survive service restarts. When internal logging is enabled (CD mode,
no explicit log_path), writes are teed to both the existing in-memory
ring buffer and a rotated file on disk (ctrld.log in the home directory).
File rotation: 5MB max with 1 backup (ctrld.log.1), so max ~10MB on disk.
Log view/send now reads from the persisted files (including backup) to
provide complete history across restarts. Live tail continues to use
the in-memory subscriber mechanism unchanged.
Activation: same conditions as existing internal logging — CD mode only,
no log_path configured. No new config options or dependencies.
When multiple network changes fire in quick succession (e.g., VPN
disconnect + interface swap), the second handleRecovery() call cancels
the first but inherits stale DoH transports, causing DNS blackouts
of up to 30 seconds.
Three changes to reduce worst-case recovery from ~30s to <3s:
1. ForceReBootstrap() on recovery entry — closes dead connections and
creates fresh transports synchronously before probing, replacing the
lazy ReBootstrap() flag that left stale connections for probes to hit.
2. Debounce handleRecovery() for network changes (500ms window) — only
the recovery flow is debounced; all other state updates (IP, pf
anchor, VPN DNS, tunnel checks) still run immediately on every event.
This eliminates the cancel-and-restart race without missing state.
3. Combined effect: ForceReBootstrap closes old in-flight connections
(closeTransports) and builds new ones (SetupTransport) atomically,
so recovery probes never inherit dead connections from a prior
recovery attempt.
README.md: fix Go version requirement (1.23 -> 1.24), update OS
support architectures (add arm64/mipsle/mips64 for Linux, arm64 for
Windows/FreeBSD, remove windows/arm), fix broken PowerShell install
path, demote H1 section headings to H2.
SetSelfIP unconditionally accessed t.dhcp, but t.dhcp is only
initialized when DHCP discovery is enabled. A network change event
can fire SetSelfIP regardless of the discovery configuration,
causing a nil pointer dereference.
Guard the t.dhcp access with a nil check so the self IP is still
updated on the Table even when DHCP discovery is disabled.
Replace conn.OpenStream (non-blocking) with conn.OpenStreamSync so that
the resolver waits for the server's MAX_STREAMS credit replenishment frame
instead of immediately failing when the stream limit is temporarily
exhausted. Also retry on StreamLimitReachedError as defense-in-depth for
servers that are slow or fail to send MAX_STREAMS updates.
Pass a quic.Config with KeepAlivePeriod (15s) to DoQ dial calls instead
of nil, so pooled connections send periodic QUIC PINGs to stay alive and
detect dead paths proactively.
Also add IdleTimeoutError to the DoQ retry conditions alongside io.EOF,
so stale pooled connections trigger a transparent retry instead of
propagating as a query failure.
When port 53 is taken (e.g. by mDNSResponder), ctrld failed with
'could not find available listen ip and port' instead of falling back
to port 5354. Root cause: tryUpdateListenerConfig() checked the
dnsIntercept bool, which is derived in prog.run() AFTER listener
config is resolved.
Fix: check interceptMode string directly (CLI flag + config fallback)
in a new tryUpdateListenerConfigIntercept() that tries 127.0.0.1:53
then 127.0.0.1:5354.
Also updates buildPFAnchorRules() to use the actual listener IP/port
from config instead of hardcoded 127.0.0.1:53, so pf rules redirect
to wherever ctrld is actually listening.
- Update comment in ensurePFAnchorReference: pfctl -sn returns
rdr-anchor only (nat-anchor not used by ctrld)
- Update nat-anchor table entry in pf-dns-intercept.md
- Add pf nuances 10-16 from investigation: cross-AF redirect,
block return, sendmsg EINVAL, nat-on-lo0, raw sockets, DIOCNATLOOK,
and the pragmatic IPv6 block solution
upstreamConfigFor() used strings.Contains(":") to decide whether to
append ":53", which always evaluates true for IPv6 addresses. This left
bare addresses like "2a0d:6fc0:9b0:3600::1" without brackets or port,
causing net.Dial to reject with "too many colons in address".
Use net.JoinHostPort() which handles IPv6 bracketing automatically,
producing "[2a0d:6fc0:9b0:3600::1]:53".
This commit adds a new `ctrld log tail` subcommand that streams
runtime debug logs to the terminal in real-time, similar to `tail -f`.
Changes:
- log_writer.go: Add Subscribe/tailLastLines for fan-out to tail clients
- control_server.go: Add /log/tail endpoint with streaming response
- Internal logging: subscribes to logWriter for live data
- File-based logging: polls log file for new data (200ms interval)
- Sends last N lines as initial context on connect
- commands.go: Add `log tail` cobra subcommand with --lines/-n flag
- control_client.go: Add postStream() with no timeout for long-lived connections
Usage:
sudo ctrld log tail # shows last 10 lines then follows
sudo ctrld log tail -n 50 # shows last 50 lines then follows
Ctrl+C to stop
The continue statement only broke out of the inner loop, so
loopback/local IPs (e.g. 127.0.0.1) were never filtered.
This caused ctrld to use itself as bootstrap DNS when already
installed as the system resolver — a self-referential loop.
Use the same isLocal flag pattern as getDNSFromScutil() and
getAllDHCPNameservers().
Send all available hostname sources (ComputerName, LocalHostName,
HostName, os.Hostname) in the metadata map when provisioning.
This allows the API to detect and repair generic hostnames like
'Mac' by picking the best available source server-side.
Belt and suspenders: preferredHostname() picks the right one
client-side, but metadata gives the API a second chance.
macOS Sequoia with Private Wi-Fi Address enabled causes os.Hostname()
to return generic names like "Mac.lan" from DHCP instead of the real
computer name. The /utility provisioning endpoint sends this raw,
resulting in devices named "Mac-lan" in the dashboard.
Fallback chain: ComputerName → LocalHostName → os.Hostname()
LocalHostName can also be affected by DHCP. ComputerName is the
user-set display name from System Settings, fully immune to network state.
Treat "socket missing" (ENOENT) and connection refused as expected when
probing the log server, and only log when the error indicates something
unexpected. This prevents noisy warnings when the log server has not
started yet.
Discover while doing captive portal tests.
Replace the map-based pool and refCount bookkeeping with a channel-based
pool. Drop the closed state, per-connection address tracking, and extra
mutexes so the pool relies on the channel for concurrency and lifecycle,
matching the approach used in the DoT pool.
Replace the map-based pool and refCount bookkeeping with a channel-based
pool. Drop the closed state, per-connection address tracking, and
extra mutexes so the pool relies on the channel for concurrency and
lifecycle.
Add connection health check in getConn to validate TLS connections
before reusing them from the pool. This prevents io.EOF errors when
reusing connections that were closed by the server (e.g., due to idle
timeout).
Add guard checks to prevent panics when processing client info with
empty IP addresses. Replace netip.MustParseAddr with ParseAddr to
handle invalid IP addresses gracefully instead of panicking.
Add test to verify queryFromSelf handles IP addresses safely.
Remove separate watchLinkState function and integrate link state change
handling directly into monitorNetworkChanges. This consolidates network
monitoring logic into a single place and simplifies the codebase.
Update netlink dependency from v1.2.1-beta.2 to v1.3.1 and netns from
v0.0.4 to v0.0.5 to use stable versions.
Add DNS suffix matching for non-physical adapters when domain-joined.
This allows interfaces with matching DNS suffix to be considered valid
even if not in validInterfacesMap, improving DNS server discovery for
remote VPN scenarios.
Disable warnings from ghw library when retrieving chassis information.
These warnings are undesirable but recoverable errors that emit unnecessary
log messages. Using WithDisableWarnings() suppresses them while maintaining
functionality.
Remove the transport Close() call from DoH3 error handling path.
The transport is shared and reused across requests, and closing it
on error would break subsequent requests. The transport lifecycle
is already properly managed by the http.Client and the finalizer
set in newDOH3Transport().
Implement TCP/TLS connection pooling for DoT resolver to match DoQ
performance. Previously, DoT created a new TCP/TLS connection for every
DNS query, incurring significant TLS handshake overhead. Now connections are
reused across queries, eliminating this overhead for subsequent requests.
The implementation follows the same pattern as DoQ, using parallel dialing
and connection pooling to achieve comparable performance characteristics.
Replace boolean rebootstrap flag with a three-state atomic integer to
prevent concurrent SetupTransport calls during rebootstrap. The atomic
state machine ensures only one goroutine can proceed from "started" to
"in progress", eliminating the need for a mutex while maintaining
thread safety.
States: NotStarted -> Started -> InProgress -> NotStarted
Note that the race condition is still acceptable because any additional
transports created during the race are functional. Once the connection
is established, the unused transports are safely handled by the garbage
collector.
Consolidate DoH/DoH3/DoQ transport initialization into a single
SetupTransport method and introduce generic helper functions to eliminate
duplicated IP stack selection logic across transport getters.
This reduces code duplication by ~77 lines while maintaining the same
functionality.
Implement QUIC connection pooling for DoQ resolver to match DoH3
performance. Previously, DoQ created a new QUIC connection for every
DNS query, incurring significant handshake overhead. Now connections are
reused across queries, eliminating this overhead for subsequent requests.
The implementation follows the same pattern as DoH3, using parallel dialing
and connection pooling to achieve comparable performance characteristics.
- Add comprehensive documentation for ctrld v2.0.0 breaking changes
- Document removal of automatic configuration for router/server platforms
- Provide step-by-step migration guide for affected users
- Include detailed dnsmasq and Windows Server configuration examples
- Update README.md to reflect v2.0.0 installer URLs and Go version requirements
- Remove references to automatic dnsmasq upstream configuration in README
- Add detailed package documentation to engine.go explaining the rule matching
system, supported rule types (Network, MAC, Domain), and priority ordering
- Include usage example demonstrating typical API usage patterns
- Remove unused Type() method from RuleMatcher interface and implementations
- Maintain backward compatibility while improving code documentation
The documentation explains the policy-based DNS routing system and how different
rule types interact with configurable priority ordering.
Remove StopOnFirstMatch field that was defined but never used in the
actual matching logic.
The current implementation always evaluates all rule types and applies
a fixed precedence (Domain > MAC > Network), making the StopOnFirstMatch
field unnecessary.
Changes:
- Remove StopOnFirstMatch from MatchingConfig structs
- Update DefaultMatchingConfig() function
- Update all test cases and references
- Simplify configuration to only include Order field
This cleanup removes dead code and simplifies the configuration API
without changing any functional behavior.