Commit Graph

123 Commits

Author SHA1 Message Date
Cuong Manh Le
b7202f8469 feat: enhance DNS proxy logging with comprehensive flow tracking
Add detailed logging throughout DNS proxy operations to improve visibility
into query processing, cache operations, and upstream resolver performance.

Key improvements:
- DNS server setup and listener management logging
- Complete query processing pipeline visibility
- Cache hit/miss and stale response handling logs
- Upstream resolver iteration and failure tracking
- Resolver-specific logging (OS, DoH, DoT, DoQ, Legacy)
- All log messages capitalized for better readability

This provides comprehensive debugging capabilities for DNS proxy operations
and helps identify performance bottlenecks and failure points in the
resolution chain.
2025-10-09 18:47:18 +07:00
Cuong Manh Le
a72ff1e769 fix: ensure upstream health checks can handle large DNS responses
- Add UpstreamConfig.VerifyMsg() method with proper EDNS0 support
- Replace hardcoded DNS messages in health checks with standardized verification method
- Set EDNS0 buffer size to 4096 bytes to handle large DNS responses
- Add test case for legacy resolver with extensive extra sections
2025-10-09 18:47:18 +07:00
Cuong Manh Le
4792183c0d Add comprehensive documentation to CLI components and core functionality
This commit extends the documentation effort by adding detailed explanatory
comments to key CLI components and core functionality throughout the cmd/
directory. The changes focus on explaining WHY certain logic is needed,
not just WHAT the code does, improving code maintainability and helping
developers understand complex business decisions.

Key improvements:
- Main entry points: Document CLI initialization, logging setup, and cache
  configuration with reasoning for design decisions
- DNS proxy core: Explain DNS proxy constants, data structures, and core
  processing pipeline for handling DNS queries
- Service management: Document service command structure, configuration
  patterns, and platform-specific service handling
- Logging infrastructure: Explain log buffer management, level encoders,
  and log formatting decisions for different use cases
- Metrics and monitoring: Document Prometheus metrics structure, HTTP
  endpoints, and conditional metric collection for performance
- Network handling: Explain Linux-specific network interface filtering,
  virtual interface detection, and DNS configuration management
- Hostname validation: Document RFC1123 compliance and DNS naming
  standards for system compatibility
- Mobile integration: Explain HTTP retry logic, fallback mechanisms, and
  mobile platform integration patterns
- Connection management: Document connection wrapper design to prevent
  log pollution during process lifecycle

Technical details:
- Added explanatory comments to 11 additional files in cmd/cli/
- Maintained consistent documentation style and format
- Preserved all existing functionality while improving code clarity
- Enhanced understanding of complex business logic and platform-specific
  behavior

These comments help future developers understand the reasoning behind
complex decisions, making the codebase more maintainable and reducing
the risk of incorrect modifications during maintenance.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
8b605da861 refactor: convert rootCmd from global to local variable
- Add appVersion variable to store curVersion() result during init
- Change initCLI() to return *cobra.Command
- Move rootCmd creation inside initCLI() as local variable
- Replace all rootCmd.Version usage with appVersion variable
- Update Main() function to capture returned rootCmd from initCLI()
- Remove sync.Once guard from tests and use initCLI() directly
- Remove sync import from test file as it's no longer needed

This refactoring improves encapsulation by eliminating global state,
reduces version computation overhead, and simplifies test setup by
removing the need for sync.Once guards. All tests pass and the
application builds successfully.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
0cd873a88f refactor: move network monitoring to separate goroutine
- Move network monitoring initialization out of serveDNS() function
- Start network monitoring in a separate goroutine during program startup
- Remove context parameter from monitorNetworkChanges() as it's not used
- Simplify serveDNS() function signature by removing unused context parameter
- Ensure network monitoring starts only once during initial run, not on reload

This change improves separation of concerns by isolating network monitoring
from DNS serving logic, and prevents potential issues with multiple
monitoring goroutines if starting multiple listeners.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
ddbb0f0db4 refactor: migrate from zerolog to zap logging library
Replace github.com/rs/zerolog with go.uber.org/zap throughout the codebase
to improve performance and provide better structured logging capabilities.

Key changes:
- Replace zerolog imports with zap and zapcore
- Implement custom Logger wrapper in log.go to maintain zerolog-like API
- Add LogEvent struct with chained methods (Str, Int, Err, Bool, etc.)
- Update all logging calls to use the new zap-based wrapper
- Replace JSON encoders with Console encoders for better readability

Benefits:
- Better performance with zap's optimized logging
- Consistent structured logging across all components
- Maintained zerolog-like API for easy migration
- Proper field context preservation for debugging
- Multi-core logging architecture for better output control

All tests pass and build succeeds.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
35e2a20019 Refactor handleRecovery method and improve tests
- Split handleRecovery into focused helper methods for better maintainability:
  * shouldStartRecovery: handles recovery cancellation logic
  * createRecoveryContext: manages recovery context and cleanup
  * prepareForRecovery: removes DNS settings and initializes OS resolver
  * completeRecovery: resets upstream state and reapplies DNS settings
  * reinitializeOSResolver: reinitializes OS resolver with proper logging
  * Update handleRecovery documentation to reflect new orchestration role

- Improve tests:
  * Add newTestProg helper to reduce test setup duplication
  * Write comprehensive table-driven tests for all recovery methods

This refactoring improves code maintainability, testability, and reduces
complexity while maintaining the same recovery behavior. Each method now
has a single responsibility and can be tested independently.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
41282d0f51 refactor: break down proxy method into smaller focused functions
Split the long proxy method into several smaller methods to improve maintainability
and testability. Each new method has a single responsibility:

- initializeUpstreams: handles upstream configuration setup
- tryCache: manages cache lookup logic
- tryUpstreams: coordinates upstream query attempts
- processUpstream: handles individual upstream query processing
- handleAllUpstreamsFailure: manages failure scenarios
- checkCache: performs cache checks and retrieval
- serveStaleResponse: handles stale cache responses
- shouldContinueWithNextUpstream: determines if failover is needed
- prepareSuccessResponse: formats successful responses

This refactoring:
- Reduces cognitive complexity
- Improves code testability
- Makes the DNS proxy logic flow clearer
- Isolates error handling and edge cases
- Maintains existing functionality

No behavioral changes were made.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
f7fb555c89 Removing Windows Server support 2025-10-09 17:49:21 +07:00
Cuong Manh Le
2e63624f6c Removing router platforms support 2025-10-09 17:49:21 +07:00
Cuong Manh Le
b18cd7ee83 refactor(dns): improve DNS proxy code structure and readability
Break down the large DNS handling function into smaller, focused functions
with clear responsibilities:

- Extract handleDNSQuery from serveDNS handler function
- Create dedicated startListeners function for listener management
- Add standardQueryRequest struct to encapsulate query parameters
- Split special domain handling into separate function
- Add descriptive comments for each new function
- Improve variable names for better clarity (e.g., startTime vs t)

This refactoring improves code maintainability and readability without
changing the core DNS proxy functionality.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
59ece456b1 refactor: improve network interface validation
Add context parameter to validInterfacesMap for better error handling and
logging. Move Windows-specific network adapter validation logic to the
ctrld package. Key changes include:

- Add context parameter to validInterfacesMap across all platforms
- Move Windows validInterfaces to ctrld.ValidInterfaces
- Improve error handling for virtual interface detection on Linux
- Update all callers to pass appropriate context

This change improves error reporting and makes the interface validation
code more maintainable across different platforms.
2025-10-09 17:49:21 +07:00
Cuong Manh Le
b9b9cfcade cmd/cli: avoid accessing mainLog when possible
By adding a logger field to "prog" struct, and use this field inside its
method instead of always accessing global mainLog variable. This at
least ensure more consistent usage of the logger during ctrld prog
runtime, and also help refactoring the code more easily in the future
(like replacing the logger library).
2025-10-09 17:46:02 +07:00
Cuong Manh Le
fc527dbdfb all: eliminate usage of global ProxyLogger
So setting up logging for ctrld binary and ctrld packages could be done
more easily, decouple the required setup for interactive vs daemon
running.

This is the first step toward replacing rs/zerolog libary with a
different logging library.
2025-10-09 17:45:59 +07:00
Cuong Manh Le
51e58b64a5 Preparing for v2.0.0 branch merge
This commit reverts changes from v1.4.5 to v1.4.7, to prepare for v2.0.0
branch codes.

Changes includes in these releases have been included in v2.0.0 branch
already.

Details:

Revert "feat: add --rfc1918 flag for explicit LAN client support"

This reverts commit 0e3f764299.

Revert "Upgrade quic-go to v0.54.0"

This reverts commit e52402eb0c.

Revert "docs: add known issues documentation for Darwin 15.5 upgrade issue"

This reverts commit 2133f31854.

Revert "start mobile library with provision id and custom hostname."

This reverts commit a198a5cd65.

Revert "Add OPNsense new lease file"

This reverts commit 7af29cfbc0.

Revert ".github/workflows: bump go version to 1.24.x"

This reverts commit ce1a165348.

Revert "fix: ensure upstream health checks can handle large DNS responses"

This reverts commit fd48e6d795.

Revert "refactor(prog): move network monitoring outside listener loop"

This reverts commit d71d1341b6.

Revert "fix: correct Windows API constants to fix domain join detection"

This reverts commit 21855df4af.

Revert "refactor: move network monitoring to separate goroutine"

This reverts commit 66e2d3a40a.

Revert "refactor: extract empty string filtering to reusable function"

This reverts commit 36a7423634.

Revert "cmd/cli: ignore empty positional argument for start command"

This reverts commit e616091249.

Revert "Avoiding Windows runners file locking issue"

This reverts commit 0948161529.

Revert "refactor: split selfUpgradeCheck into version check and upgrade execution"

This reverts commit ce29b5d217.

Revert "internal/router: support Ubios 4.3+"

This reverts commit de24fa293e.

Revert "internal/router: support Merlin Guest Network Pro VLAN"

This reverts commit 6663925c4d.
2025-10-09 16:47:51 +07:00
Cuong Manh Le
0e3f764299 feat: add --rfc1918 flag for explicit LAN client support
Make RFC1918 listener spawning opt-in via --rfc1918 flag instead of automatic behavior.
This allows users to explicitly control when ctrld listens on private network addresses
to receive DNS queries from LAN clients, improving security and configurability.

Refactor network interface detection to better distinguish between physical and virtual
interfaces, ensuring only real hardware interfaces are used for RFC1918 address binding.
2025-09-25 16:45:56 +07:00
Cuong Manh Le
fd48e6d795 fix: ensure upstream health checks can handle large DNS responses
- Add UpstreamConfig.VerifyMsg() method with proper EDNS0 support
- Replace hardcoded DNS messages in health checks with standardized verification method
- Set EDNS0 buffer size to 4096 bytes to handle large DNS responses
- Add test case for legacy resolver with extensive extra sections
2025-08-15 22:55:47 +07:00
Cuong Manh Le
66e2d3a40a refactor: move network monitoring to separate goroutine
- Move network monitoring initialization out of serveDNS() function
- Start network monitoring in a separate goroutine during program startup
- Remove context parameter from monitorNetworkChanges() as it's not used
- Simplify serveDNS() function signature by removing unused context parameter
- Ensure network monitoring starts only once during initial run, not on reload

This change improves separation of concerns by isolating network monitoring
from DNS serving logic, and prevents potential issues with multiple
monitoring goroutines if starting multiple listeners.
2025-08-12 16:46:57 +07:00
Cuong Manh Le
b4faf82f76 all: set edns0 cookie for shared message
For cached or singleflight messages, the edns0 cookie is currently
shared among all of them, causing mismatch cookie warning from clients.
The ctrld proxy should re-set client cookies for each request
separately, even though they use the same shared answer.
2025-05-27 18:09:16 +07:00
Cuong Manh Le
00e9d2bdd3 all: do not listen on 0.0.0.0 on desktop clients
Since this may create security vulnerabilities such as DNS amplification
or abusing because the listener was exposed to the entire local network.
2025-05-15 16:59:24 +07:00
Cuong Manh Le
f27cbe3525 all: fallback to use direct IPs for ControlD assets 2025-03-26 23:17:50 +07:00
Cuong Manh Le
58c0e4f15a all: remove ipv6 check polling
netmon provides ipv6 availability during network event changes, so use
this metadata instead of wasting on polling check.

Further, repeated network errors will force marking ipv6 as disable if
were being enabled, catching a rare case when ipv6 were disabled from
cli or system settings.
2025-03-26 23:16:38 +07:00
Cuong Manh Le
c7168739c7 cmd/cli: use OS resolver as default upstream for SRV lan hostname
Since application may need SRV record for public domains, which could be
blocked by OS resolver, but not with remote upstreams.

This was reported by a Minecraft user, who seeing thing is broken after
upgrading to v1.4.0 release.
2025-02-21 20:44:34 +07:00
Alex
eff5ff580b use saved static nameservers stored for the default router interface when doing nameserver discovery
fix bad logger usages

patch darwin interface name

patch darwin interface name, debugging

make resetDNS check for static config on startup, optionally restoring static confiration as needed

fix netmon logging
2025-02-21 20:33:04 +07:00
Alex
3480043e40 handle default route changes
remove old os resolver IPs on interface down

better debugging for os resolver
2025-02-18 20:30:54 +07:00
Alex
0123ca44fb ignore ipv6 addresses from defaultRouteIP, guard against using ipv6 address as v4 default 2025-02-18 20:25:35 +07:00
Alex
7929aafe2a OS resolver retry should respect the leak_on_upstream_failure config option 2025-02-18 20:25:26 +07:00
Alex
e6de78c1fa fix leak_on_upstream_failure config param 2025-02-18 20:22:33 +07:00
Alex
81e0bad739 increase failure count for all queries with no answer 2025-02-11 19:29:48 +07:00
Alex
7d07d738dc fix failure count on OS retry 2025-02-11 19:28:55 +07:00
Alex
0fae584e65 OS resolver retry catch all 2025-02-11 19:27:50 +07:00
Alex
9e83085f2a handle old state missing interface crash 2025-02-11 19:27:46 +07:00
Alex
41a00c68ac fix down state handling 2025-02-11 19:27:41 +07:00
Alex
60e65a37a6 do the reset after recovery finished 2025-02-10 18:56:09 +07:00
Alex
98042d8dbd remove leaking logic in favor of recovery logic. 2025-02-10 18:55:36 +07:00
Cuong Manh Le
af4b826b68 cmd/cli: implement valid interfaces map for all systems
Previously, a valid interfaces map is only meaningful on Windows and
Darwin, where ctrld needs to set DNS for all physical interfaces.

With new network monitor, the valid interfaces is used for checking new
changes, thus we have to implement the valid interfaces map for all
systems.

 - On Linux, just retrieving all non-virtual interfaces.
 - On others, fallback to use default route interface only.
2025-02-10 18:45:17 +07:00
Alex
398f71fd00 fix leakingQueryReset usages 2025-02-10 18:44:52 +07:00
Alex
e1301ade96 remove context timeout 2025-02-10 18:44:46 +07:00
Alex
7a23f82192 set leakingQueryReset to prevent watchdogs from resetting dns 2025-02-10 18:44:40 +07:00
Alex
0c74838740 init os resolver after upstream recovers 2025-02-10 18:44:23 +07:00
Alex
4b05b6da7b fix missing unlock 2025-02-10 18:43:03 +07:00
Alex
375844ff1a remove handler log line 2025-02-10 18:42:59 +07:00
Alex
1d207379cb wait for healthy upstream before accepting queries on network change 2025-02-10 18:42:53 +07:00
Alex
fb49cb71e3 debounce upstream failure checking and failure counts 2025-02-10 18:41:48 +07:00
Alex
9618efbcde improve network change ip filtering logic 2025-02-10 18:41:43 +07:00
Alex
bb2210b06a ip detection debugging 2025-02-10 18:41:39 +07:00
Alex
917052723d don't overwrite OS resolver nameservers if there arent any 2025-02-10 18:41:34 +07:00
Alex
fef85cadeb filter non usabel IPs from state changes 2025-02-10 18:41:30 +07:00
Alex
4a05fb6b28 use the changed iface if no default route is set yet 2025-02-10 18:41:25 +07:00
Alex
6644ce53f2 fix interface IP CIDR parsing 2025-02-10 18:41:20 +07:00