After installing as a system service, "ctrld start" does an end-to-end
test for ensuring DNS can be resolved correctly. However, in case the
system is mis-configured (by firewall, other softwares ...) and the test
query could not be sent to ctrld listener, the current error message is
not helpful, causing the confusion from users perspective.
To improve this, selfCheckStatus function now returns the actual status
and error during its process. The caller can now rely on the service
status and the error to produce more useful/friendly message to users.
In some old Windows systems, s.Uninstall does not remove the service
completely at the time s.Install was running, prevent ctrld from being
installed again.
Workaround this by attempting to uninstall ctrld several times, re-check
for service status after each attempt to ensure it was uninstalled.
Windows may raise WSAEHOSTUNREACH instead WSAENETUNREACH in case of
network not available when resuming from sleep or switching network, so
checkUpstream is never kicked in for this type of error.
To prevent duplicated running of checkUpstream function at the same
time, upstream monitor uses a boolean to report whether the upstream is
checking. If this boolean is true, then other calls after the first one
will be returned immediately.
However, checkUpstream does not set this boolean to false when it
finishes, thus all future calls to checkUpstream won't be run, causing
the upstream is marked as down forever.
Fixing this by ensuring the boolean is reset once checkUpstream done.
While at it, also guarding all upstream monitor operations with a mutex,
ensuring there's no race condition between marking upstream state.
We see number of failed test in Github Action, mostly on MacOS or
Windows due to the fact that goroutines are scheduled to be run
consequently.
This commit improves the test, ensuring at least 2 goroutines were
started before increasing the counting.
On some routers, change to network may trigger re-rendering
/etc/resolv.conf file, causing requests from router itself stop using
ctrld.
Fixing this by watching changes to /etc/resolv.conf, then revert them.
For Android devices, when it joins the network, it uses ctrld to resolve
its private DNS once and never reaches ctrld again. For each time, it uses
a different IPv6 address, which causes hundreds/thousands different client
IDs created for the same device, which is pointless.
Generating nextdns config must happen after stopping current ctrld
process. Otherwise, config processing may pick wrong IP+Port.
While at it, also making logging better when updating listener config:
- Change warn to info, prevent confusing that "something is wrong".
- Do not emit info when generating working default config, which may
cause duplicated messages printed.
Otherwise, network changes may not be seen on some platforms, causing
ctrld failed to recover and failing all requests.
While at it, also doing the check DNS in separate goroutine, prevent it
from blocking ctrld from notifying others that it "started". The issue
was seen when ctrld is configured as direct listener, requests are
flooded before ctrld started, causing the healtch process failed.
The provision token is only used once, then do not have any effect after
Control D uid is fetched. So making it appears in "ctrld run" command is
useless.