Commit Graph

942 Commits

Author SHA1 Message Date
Cuong Manh Le 7b360288ed doq: validate DNS-over-QUIC response framing
DoQ responses are length-prefixed per RFC 9250. The resolver previously
assumed the stream always contained at least two bytes and unpacked from
buf[2:], which could panic on truncated or malicious replies.

Validate the prefix against the bytes read, return a clear error, and
retire the connection from the pool on framing failure. Unpack only the
slice declared by the prefix so a short read cannot be misinterpreted as
a full message.

Add regression coverage with a small test server that returns malformed
raw payloads (empty, one byte, prefix-only, prefix larger than payload).
2026-05-16 04:13:11 +07:00
Cuong Manh Le 65d3d468f7 Bump golang.org/x/net to v0.53.0 2026-05-12 12:42:41 +07:00
Cuong Manh Le 01490434a6 cmd/cli: rate-limit PIN brute-force on control socket
Currently there is no limit on PIN attempts, allowing unlimited
brute force if an attacker gains socket access. While the socket is
root-only by default, rate limiting is cheap defense-in-depth.
2026-05-12 12:42:07 +07:00
Cuong Manh Le a61677b6e4 cmd/cli: use os.CreateTemp for symlink-safe temp file creation
Current code writes to a predictable path, which on systems without
`fs.protected_symlinks` (e.g. embedded routers) could allow a local
attacker with API compromise to perform symlink attacks.
2026-05-12 12:41:56 +07:00
Cuong Manh Le 8e2ef7ca65 all: explicit TLS MinVersion in tls.Config
Go's default is already TLS 1.2+ (since Go 1.18), but making this
explicit satisfies RFC 7858/9250 recommendations and makes the security
intent clear for auditors.
2026-05-12 12:41:47 +07:00
Codescribe 1735d3d55b cmd/cli: skip upstream.os healthcheck when WFP loopback protect enabled
When WFP loopback protect is active, the upstream.os healthcheck will
always fail because an external WFP block filter is interfering with
plain DNS. This demotes those expected failures to debug level and
returns errOsHealthcheckSuppressed so the recovery loop treats them
as non-fatal, eliminating the log spam described in #526.
2026-05-07 19:37:42 +07:00
Codescribe 81aa6b237b dns_intercept: add WFP loopback protect for VPN block-outside-dns
When third-party VPN software (e.g., OpenVPN) installs WFP block filters via
block-outside-dns, all DNS traffic to non-tunnel interfaces is blocked —
including DNS to 127.0.0.1 (ctrld's NRPT target). This breaks DNS mode
interception because the NRPT catch-all rule routes queries to loopback,
but WFP blocks the connection before it reaches ctrld's listener.

Fix: after exhausting all NRPT recovery attempts, activate a minimal WFP
session with "hard permit" filters (FWPM_FILTER_FLAG_CLEAR_ACTION_RIGHT)
for DNS to localhost in a max-priority sublayer (weight 0xFFFF). This
overrides the VPN's block for loopback DNS only, while preserving the
VPN's DNS leak protection for all other (non-loopback) DNS traffic.

The loopback protect is:
- Only activated when NRPT probes fail (not preemptively)
- Harmless when no conflicting WFP blocks exist (permit-only, no blocks)
- Persistent until ctrld shutdown (survives VPN reconnect cycles)
- Cleaned up by the existing cleanupWFPFilters path on shutdown
2026-04-30 19:30:43 +07:00
Codescribe 8abeeea4c3 log: persist internal runtime logs to disk
Add file-backed persistence to the internal logWriter so runtime logs
survive service restarts. When internal logging is enabled (CD mode,
no explicit log_path), writes are teed to both the existing in-memory
ring buffer and a rotated file on disk (ctrld.log in the home directory).

File rotation: 5MB max with 1 backup (ctrld.log.1), so max ~10MB on disk.
Log view/send now reads from the persisted files (including backup) to
provide complete history across restarts. Live tail continues to use
the in-memory subscriber mechanism unchanged.

Activation: same conditions as existing internal logging — CD mode only,
no log_path configured. No new config options or dependencies.
2026-04-30 19:19:19 +07:00
Codescribe b3c670b17e dns: fix recovery race condition during rapid network transitions
When multiple network changes fire in quick succession (e.g., VPN
disconnect + interface swap), the second handleRecovery() call cancels
the first but inherits stale DoH transports, causing DNS blackouts
of up to 30 seconds.

Three changes to reduce worst-case recovery from ~30s to <3s:

1. ForceReBootstrap() on recovery entry — closes dead connections and
   creates fresh transports synchronously before probing, replacing the
   lazy ReBootstrap() flag that left stale connections for probes to hit.

2. Debounce handleRecovery() for network changes (500ms window) — only
   the recovery flow is debounced; all other state updates (IP, pf
   anchor, VPN DNS, tunnel checks) still run immediately on every event.
   This eliminates the cancel-and-restart race without missing state.

3. Combined effect: ForceReBootstrap closes old in-flight connections
   (closeTransports) and builds new ones (SetupTransport) atomically,
   so recovery probes never inherit dead connections from a prior
   recovery attempt.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 70b45710e7 docs: improve common gotchas and non-obvious patterns
README.md: fix Go version requirement (1.23 -> 1.24), update OS
support architectures (add arm64/mipsle/mips64 for Linux, arm64 for
Windows/FreeBSD, remove windows/arm), fix broken PowerShell install
path, demote H1 section headings to H2.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 2742669bc1 fix: prevent panic on network change during SetSelfIP
SetSelfIP unconditionally accessed t.dhcp, but t.dhcp is only
initialized when DHCP discovery is enabled. A network change event
can fire SetSelfIP regardless of the discovery configuration,
causing a nil pointer dereference.

Guard the t.dhcp access with a nil check so the self IP is still
updated on the Table even when DHCP discovery is disabled.
2026-04-30 19:19:19 +07:00
Cuong Manh Le a767ebdaa5 doq: use OpenStreamSync and retry on StreamLimitReachedError
Replace conn.OpenStream (non-blocking) with conn.OpenStreamSync so that
the resolver waits for the server's MAX_STREAMS credit replenishment frame
instead of immediately failing when the stream limit is temporarily
exhausted. Also retry on StreamLimitReachedError as defense-in-depth for
servers that are slow or fail to send MAX_STREAMS updates.
2026-04-30 19:19:19 +07:00
Codescribe a92d20cef8 doq: configure QUIC keep-alive and retry on idle timeout
Pass a quic.Config with KeepAlivePeriod (15s) to DoQ dial calls instead
of nil, so pooled connections send periodic QUIC PINGs to stay alive and
detect dead paths proactively.

Also add IdleTimeoutError to the DoQ retry conditions alongside io.EOF,
so stale pooled connections trigger a transparent retry instead of
propagating as a query failure.
2026-04-30 19:19:19 +07:00
Codescribe a8821e6d00 fix(darwin): support non-standard listener port in intercept mode
When port 53 is taken (e.g. by mDNSResponder), ctrld failed with
'could not find available listen ip and port' instead of falling back
to port 5354. Root cause: tryUpdateListenerConfig() checked the
dnsIntercept bool, which is derived in prog.run() AFTER listener
config is resolved.

Fix: check interceptMode string directly (CLI flag + config fallback)
in a new tryUpdateListenerConfigIntercept() that tries 127.0.0.1:53
then 127.0.0.1:5354.

Also updates buildPFAnchorRules() to use the actual listener IP/port
from config instead of hardcoded 127.0.0.1:53, so pf rules redirect
to wherever ctrld is actually listening.
2026-04-30 19:19:19 +07:00
Codescribe a3880beec2 docs: port IPv6 learnings and comment fixes to master
- Update comment in ensurePFAnchorReference: pfctl -sn returns
  rdr-anchor only (nat-anchor not used by ctrld)
- Update nat-anchor table entry in pf-dns-intercept.md
- Add pf nuances 10-16 from investigation: cross-AF redirect,
  block return, sendmsg EINVAL, nat-on-lo0, raw sockets, DIOCNATLOOK,
  and the pragmatic IPv6 block solution
2026-04-30 19:19:19 +07:00
Codescribe d7124995d2 fix: bracket IPv6 addresses in VPN DNS upstream config
upstreamConfigFor() used strings.Contains(":") to decide whether to
append ":53", which always evaluates true for IPv6 addresses. This left
bare addresses like "2a0d:6fc0:9b0:3600::1" without brackets or port,
causing net.Dial to reject with "too many colons in address".

Use net.JoinHostPort() which handles IPv6 bracketing automatically,
producing "[2a0d:6fc0:9b0:3600::1]:53".
2026-04-30 19:19:19 +07:00
Codescribe 86dafc432d Add log tail command for live log streaming
This commit adds a new `ctrld log tail` subcommand that streams
runtime debug logs to the terminal in real-time, similar to `tail -f`.

Changes:
- log_writer.go: Add Subscribe/tailLastLines for fan-out to tail clients
- control_server.go: Add /log/tail endpoint with streaming response
  - Internal logging: subscribes to logWriter for live data
  - File-based logging: polls log file for new data (200ms interval)
  - Sends last N lines as initial context on connect
- commands.go: Add `log tail` cobra subcommand with --lines/-n flag
- control_client.go: Add postStream() with no timeout for long-lived connections

Usage:
  sudo ctrld log tail          # shows last 10 lines then follows
  sudo ctrld log tail -n 50    # shows last 50 lines then follows
  Ctrl+C to stop
2026-04-30 19:19:19 +07:00
Cuong Manh Le ca8d07d3f5 fix(darwin): correct pf rules tests 2026-04-30 19:19:19 +07:00
Cuong Manh Le 2aaa78ef48 fix(windows): make staticcheck happy 2026-04-30 19:19:19 +07:00
Codescribe 0f2a930cf8 feat: robust username detection and CI updates
Add platform-specific username detection for Control D metadata:
- macOS: directory services (dscl) with console user fallback
- Linux: systemd loginctl, utmp, /etc/passwd traversal
- Windows: WTS session enumeration, registry, token lookup
2026-04-30 19:19:19 +07:00
Codescribe 5a6163142c feat: add VPN DNS split routing 2026-04-30 19:19:19 +07:00
Codescribe 402771bed6 feat: add Windows NRPT and WFP DNS interception 2026-04-30 19:19:19 +07:00
Codescribe a99dcca288 feat: add macOS pf DNS interception 2026-04-30 19:19:19 +07:00
Codescribe 395335162f feat: introduce DNS intercept mode infrastructure 2026-04-30 19:19:19 +07:00
Codescribe c56d4771de docs: add DNS Intercept Mode section to README 2026-04-30 19:19:19 +07:00
Codescribe ea48186d73 Fix dnsFromResolvConf not filtering loopback IPs
The continue statement only broke out of the inner loop, so
loopback/local IPs (e.g. 127.0.0.1) were never filtered.
This caused ctrld to use itself as bootstrap DNS when already
installed as the system resolver — a self-referential loop.

Use the same isLocal flag pattern as getDNSFromScutil() and
getAllDHCPNameservers().
2026-04-30 19:19:19 +07:00
Cuong Manh Le bc71622deb Use go1.25 for CI 2026-04-30 19:19:19 +07:00
Codescribe 846aaac27a fix: include hostname hints in metadata for API-side fallback
Send all available hostname sources (ComputerName, LocalHostName,
HostName, os.Hostname) in the metadata map when provisioning.
This allows the API to detect and repair generic hostnames like
'Mac' by picking the best available source server-side.

Belt and suspenders: preferredHostname() picks the right one
client-side, but metadata gives the API a second chance.
2026-04-30 19:19:19 +07:00
Codescribe f1e49a7ee6 fix(darwin): use scutil for provisioning hostname (#485)
macOS Sequoia with Private Wi-Fi Address enabled causes os.Hostname()
to return generic names like "Mac.lan" from DHCP instead of the real
computer name. The /utility provisioning endpoint sends this raw,
resulting in devices named "Mac-lan" in the dashboard.

Fallback chain: ComputerName → LocalHostName → os.Hostname()

LocalHostName can also be affected by DHCP. ComputerName is the
user-set display name from System Settings, fully immune to network state.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 878b3d7920 fix(cli): avoid warning when HTTP log server is not yet available
Treat "socket missing" (ENOENT) and connection refused as expected when
probing the log server, and only log when the error indicates something
unexpected. This prevents noisy warnings when the log server has not
started yet.

Discover while doing captive portal tests.
2026-04-30 19:19:19 +07:00
Cuong Manh Le f1b93c81bc refactor(doq): simplify DoQ connection pool implementation
Replace the map-based pool and refCount bookkeeping with a channel-based
pool. Drop the closed state, per-connection address tracking, and extra
mutexes so the pool relies on the channel for concurrency and lifecycle,
matching the approach used in the DoT pool.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 60dd366cc4 refactor(dot): simplify DoT connection pool implementation
Replace the map-based pool and refCount bookkeeping with a channel-based
pool. Drop the closed state, per-connection address tracking, and
extra mutexes so the pool relies on the channel for concurrency and
lifecycle.
2026-04-30 19:19:19 +07:00
Cuong Manh Le e45e56c021 fix(dot): validate connections before reuse to prevent io.EOF errors
Add connection health check in getConn to validate TLS connections
before reusing them from the pool. This prevents io.EOF errors when
reusing connections that were closed by the server (e.g., due to idle
timeout).
2026-04-30 19:19:19 +07:00
Cuong Manh Le 6f331f19c8 fix(dns): handle empty and invalid IP addresses gracefully
Add guard checks to prevent panics when processing client info with
empty IP addresses. Replace netip.MustParseAddr with ParseAddr to
handle invalid IP addresses gracefully instead of panicking.

Add test to verify queryFromSelf handles IP addresses safely.
2026-04-30 19:19:19 +07:00
Cuong Manh Le e4ca728ef0 refactor(network): consolidate network change monitoring
Remove separate watchLinkState function and integrate link state change
handling directly into monitorNetworkChanges. This consolidates network
monitoring logic into a single place and simplifies the codebase.

Update netlink dependency from v1.2.1-beta.2 to v1.3.1 and netns from
v0.0.4 to v0.0.5 to use stable versions.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 8117084d39 fix(windows): improve DNS server discovery for domain-joined machines
Add DNS suffix matching for non-physical adapters when domain-joined.
This allows interfaces with matching DNS suffix to be considered valid
even if not in validInterfacesMap, improving DNS server discovery for
remote VPN scenarios.
2026-04-30 19:19:19 +07:00
Cuong Manh Le e23451df37 fix(system): disable ghw warnings to reduce log noise
Disable warnings from ghw library when retrieving chassis information.
These warnings are undesirable but recoverable errors that emit unnecessary
log messages. Using WithDisableWarnings() suppresses them while maintaining
functionality.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 43d4e1957c fix: remove incorrect transport close on DoH3 error
Remove the transport Close() call from DoH3 error handling path.
The transport is shared and reused across requests, and closing it
on error would break subsequent requests. The transport lifecycle
is already properly managed by the http.Client and the finalizer
set in newDOH3Transport().
2026-04-30 19:19:19 +07:00
Cuong Manh Le ba3dd3a4b0 Including system metadata when posting to utility API 2026-04-30 19:19:19 +07:00
Cuong Manh Le 8b92dc97a3 perf(dot): implement connection pooling for improved performance
Implement TCP/TLS connection pooling for DoT resolver to match DoQ
performance. Previously, DoT created a new TCP/TLS connection for every
DNS query, incurring significant TLS handshake overhead. Now connections are
reused across queries, eliminating this overhead for subsequent requests.

The implementation follows the same pattern as DoQ, using parallel dialing
and connection pooling to achieve comparable performance characteristics.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 9158cd7835 fix(config): use three-state atomic for rebootstrap to prevent data race
Replace boolean rebootstrap flag with a three-state atomic integer to
prevent concurrent SetupTransport calls during rebootstrap. The atomic
state machine ensures only one goroutine can proceed from "started" to
"in progress", eliminating the need for a mutex while maintaining
thread safety.

States: NotStarted -> Started -> InProgress -> NotStarted

Note that the race condition is still acceptable because any additional
transports created during the race are functional. Once the connection
is established, the unused transports are safely handled by the garbage
collector.
2026-04-30 19:19:19 +07:00
Cuong Manh Le 2d9603609f refactor(config): consolidate transport setup and eliminate duplication
Consolidate DoH/DoH3/DoQ transport initialization into a single
SetupTransport method and introduce generic helper functions to eliminate
duplicated IP stack selection logic across transport getters.

This reduces code duplication by ~77 lines while maintaining the same
functionality.
2026-04-30 19:19:19 +07:00
Cuong Manh Le e4e655414c perf(doq): implement connection pooling for improved performance
Implement QUIC connection pooling for DoQ resolver to match DoH3
performance. Previously, DoQ created a new QUIC connection for every
DNS query, incurring significant handshake overhead. Now connections are
reused across queries, eliminating this overhead for subsequent requests.

The implementation follows the same pattern as DoH3, using parallel dialing
and connection pooling to achieve comparable performance characteristics.
2026-04-30 19:19:19 +07:00
Cuong Manh Le aacba92698 docs: add documentation for runtime internal logging 2026-04-30 19:19:19 +07:00
Cuong Manh Le c3c9e1a4d7 .github/workflows: temporary use actions/setup-go
Since WillAbides/setup-go-faster failed with macOS-latest.

See: https://github.com/WillAbides/setup-go-faster/issues/37
2026-04-30 19:19:19 +07:00
Cuong Manh Le 9a3840954b Upgrade quic-go to v0.57.0 2026-04-30 19:19:19 +07:00
Cuong Manh Le 673308a1fe docs: add v2.0.0 breaking changes documentation
- Add comprehensive documentation for ctrld v2.0.0 breaking changes
- Document removal of automatic configuration for router/server platforms
- Provide step-by-step migration guide for affected users
- Include detailed dnsmasq and Windows Server configuration examples
- Update README.md to reflect v2.0.0 installer URLs and Go version requirements
- Remove references to automatic dnsmasq upstream configuration in README
2026-04-30 19:19:19 +07:00
Cuong Manh Le 2cb0456265 .github/workflows: upgrade staticcheck-action to v1.4.0
While at it, also bump go version to 1.24
2026-04-30 19:19:19 +07:00
Cuong Manh Le 9dd4183981 Upgrade quic-go to v0.56.0
Updates #461
2026-04-30 19:19:19 +07:00
Cuong Manh Le aacbcad133 cmd/cli: workaround TB.TemdDir path too long for Unix socket path
Discover while testing v2.0.0 Github MR.

See: https://github.com/golang/go/issues/62614

While at it, also fix staticcheck linter on Windows.
2026-04-30 19:19:19 +07:00