The bootstrap process has two issues that can make ctrld stop resolving
after restarting machine host.
ctrld uses bootstrap DNS and os nameservers for resolving upstream. On
unix, /etc/resolv.conf content is used to get available nameservers.
This works well when installing ctrld. However, after being installed,
ctrld may modify the content of /etc/resolv.conf itself, to make other
apps use its listener as DNS resolver. So when ctrld starts after OS
restart, it ends up using [bootstrap DNS + ctrld's listener], for
resolving upstream. At this moment, if ctrld could not contact bootstrap
DNS for any reason, upstream domain will not be resolved.
For above reason, an upstream may not have bootstrap IPs after ctrld
starts. When re-bootstrapping, if there's no bootstrap IPs, ctrld should
call the setup bootstrap process again. Currently, it does not, causing
all queries failed.
This commit fixes above issue by adding mechanism for retrieving OS
nameservers properly, by querying routing table information:
- Parsing /proc/net subsystem on Linux.
- For BSD variants, just fetching routing information base from OS.
- On Windows, just include the gateway information when reading iface.
The fixing for second issue is trivial, just kickoff a bootstrap process
if there's no bootstrap IPs when re-boostrapping.
While at it, also ensure that fetching resolver information from
ControlD API is also used the same approach.
Fixes#34
For os resolver, ctrld queries against all servers concurrently, and get
the first success result back. However, if all server failed, the result
channel is not closed, causing ctrld hang.
Fixing this by closing the result channel once getting back all response
from servers.
While at it, also shorten the backoff time when waiting for network up,
ctrld should serve as fast as possible after network is available.
Updates #34
- Include version/OS information when logging
- Make time field human readable in log file
- Force root privilege when running status command on darwin
Updates #34
When startup, ctrld waits for network up before calling s.Run to starts
its logic. However, if network is down on startup, ctrld will hang on
waiting for network up. That causes OS service manager unhappy, as ctrld
do not response to it, marking ctrld as failure service and never start
ctrld again.
To fix this, we should call s.Run as soon as possible, and use a channel
for waiting a signal that we can actual do our logic after network up.
Update #34
Avoiding reading/writing global config, causing a data race. While at
it, also guarding read/write access to cfg.Service.AllocateIP field,
since when it is read/write by multiple goroutines.
We only need on demand information when re-bootstrapping. On Bootsrap,
this is already checked by ctrldnet.Up, so on demand query will cause
un-necessary slow down if external ipv6 is slow to response.
On Windows host with StarLink network, ctrld hangs on startup for ~30s
before continue running. This dues to IPv6 is configured but no external
IPv6 can be reached. When probing stack, ctrld is dialing using ipv6
without any timeout set, so the dialing timeout is enforced by OS.
This commit adds a timeout for probing dialer, so we ensure the probing
process will fail fast.
So we don't have to worry about network stack changes causes an upstream
to be broken. Just send requests to all nameservers concurrently, and
get the first success response.
Instead of re-query DNS record for upstream when re-bootstrapping, just
query all records on startup, then selecting the next bootstrap ip
depends on the current network stack.
For better recovery and dealing with network stack changes, this commit
change the request flow to:
failure of any kind -> recreate transport/re-bootstrap -> retry once
That would make ctrld recover from all scenarios in theory.
At startup, ctrld gathers bootstrap IP information and use this
bootstrap IP for connecting to upstream. However, in case the network
stack changed, for example, dues to VPN connection, ctrld will still use
this old (maybe invalid) bootstrap IP for the current network stack.
This commit rework the discovering process, and re-initializing the
bootstrap IP if connecting to upstream failed.