By recording both the error and output of external commands.
While at it:
- Removing un-necessary usages of sudo, since ctrld already
running with root privilege.
- Removing un-used function triggerCaptiveCheck.
So these events will be recorded separately from normal runtime log,
making troubleshooting later more easily.
While at it, only update ctrld.ProxyLogger for runCmd, it's the only one
which needs to log the query when proxying requests.
flush dns cache, manually hit captive portal on MacOS
fix real ip in debug log
treat all upstreams as down upon network change
delay upstream checks when leaking queries on network changes
debugging
skip type 24 in nameserver detection
skip type 24 in nameserver detection
remove interface type check from valid interfaces for now
skip non hardware interfaces in DNS nameserver lookup
ignore win api log output
set retries to 5 and 1s backoff
reset DNS when upgrading to make sure we get the proper OS nameservers on start
init running iface for upgrade
update windows service options for auto restarts on failure
make upgrade use the actual stop and start commands
fix the windows service retry logic
fix the windows service retry logic
task debugging
more task debugging
windows service name fix
windows service name fix
fix start command args
fix restart delay
dont recover from non crash failures
fix upgrade flow
fix logging
fix logging
try to enable nameserver logs
try to enable nameserver logs
handle flags in interface state changes
debugging
debugging
debugging
fix state detection, AD status fix
fix debugging line
more dc info
always log state changes
remove unused method
windows AD IP discovery
windows AD IP discovery
windows AD IP discovery
For normal OS resolver, ctrld does not use local addresses as nameserver
to avoid possible looping. However, on AD environment with local DNS
running, AD queries must be sent to the local DNS server for proper
resolving.
fix test
use upstreamIS var
init map, fix watcher flag
attempt to detect network changes
attempt to detect network changes
cancel and rerun reinitializeOSResolver
cancel and rerun reinitializeOSResolver
cancel and rerun reinitializeOSResolver
ignore invalid inferaces
ignore invalid inferaces
allow OS resolver upstream to fail
dont wait for dnsWait group on reinit, check for active interfaces to trigger reinit
fix unused var
simpler active iface check, debug logs
dont spam network service name patching on Mac
dont wait for os resolver nameserver testing
remove test for osresovlers for now
async nameserver testing
remove unused test
smol tweaks to nameserver test queries
fix restoreDNS errors
add some debugging information
fix wront type in log msg
set send logs command timeout to 5 mins
when the runningIface is no longer up, attempt to find a new interface
prefer default route, ignore non physical interfaces
prefer default route, ignore non physical interfaces
add max context timeout on performLeakingQuery with more debug logs
The current flow involves marking OS resolver as down, which is not
right at all, since ctrld depends on it for leaking queries.
This commits implements new flow, which ctrld will restore DNS settings
once leaking marked, allowing queries go to OS resolver until the
internet connection is established.
Since SRV record is mostly useful in AD environment. Even in non-AD one,
the OS resolver could still resolve the query for external services.
Users who want special treatment can still specify domain rules to
forward requests to ControlD upstreams explicitly.
So it would work in more general case than just captive portal network,
which ctrld have supported recently.
Uses who may want no leaking behavior can use a config to turn off this
feature.
So it can be run regardless of ctrld current status. This prevents a
racy behavior when reset DNS task restores DNS settings of the system,
but current running ctrld process may revert it immediately.
For query domain that matches "uid.verify.controld.com" in cd mode, and
the uid has the same value with "--cd" flag, ctrld will fetch uid config
from ControlD API, using this config if valid.
This is useful for force syncing API without waiting until the API
reload ticker fire.
ControlD have global list of known captive portals that user can augment
with proper setup. However, this requires manual actions, and involving
restart ctrld for taking effects.
By allowing ctrld "leaks" DNS queries to OS resolver, this process
becomes automatically, the captive portal could intercept these queries,
and as long as it was passed, ctrld will resume normal operation.
The "ctrld start" command is running slow, and using much CPU than
necessary. The problem was made because of several things:
1. ctrld process is waiting for 5 seconds before marking listeners up.
That ends up adding those seconds to the self-check process, even
though the listeners may have been already available.
2. While creating socket control client, "s.Status()" is called to
obtain ctrld service status, so we could terminate early if the
service failed to run. However, that would make a lot of syscall in a
hot loop, eating the CPU constantly while the command is running. On
Windows, that call would become slower after each calls. The same
effect could be seen using Windows services manager GUI, by pressing
start/stop/restart button fast enough, we could see a timeout raised.
3. The socket control server is started lately, after all the listeners
up. That would make the loop for creating socket control client run
longer and use much resources than necessary.
Fixes for these problems are quite obvious:
1. Removing hard code 5 seconds waiting. NotifyStartedFunc is enough to
ensure that listeners are ready for accepting requests.
2. Check "s.Status()" only once before the loop. There has been already
30 seconds timeout, so if anything went wrong, the self-check process
could be terminated, and won't hang forever.
3. Starting socket control server earlier, so newSocketControlClient can
connect to server with fewest attempts, then querying "/started"
endpoint to ensure the listeners have been ready.
With these fixes, "ctrld start" now run much faster on modern machines,
taking ~1-2 seconds (previously ~5-8 seconds) to finish. On dual cores
VM, it takes ~5-8 seconds (previously a few dozen seconds or timeout).
---
While at it, there are two refactoring for making the code easier to
read/maintain:
- PersistentPreRun is now used in root command to init console logging,
so we don't have to initialize them in sub-commands.
- NotifyStartedFunc now use channel for synchronization, instead of a
mutex, making the ugly asymetric calls to lock goes away, making the
code more idiom, and theoretically have better performance.