To prevent duplicated running of checkUpstream function at the same
time, upstream monitor uses a boolean to report whether the upstream is
checking. If this boolean is true, then other calls after the first one
will be returned immediately.
However, checkUpstream does not set this boolean to false when it
finishes, thus all future calls to checkUpstream won't be run, causing
the upstream is marked as down forever.
Fixing this by ensuring the boolean is reset once checkUpstream done.
While at it, also guarding all upstream monitor operations with a mutex,
ensuring there's no race condition between marking upstream state.
We see number of failed test in Github Action, mostly on MacOS or
Windows due to the fact that goroutines are scheduled to be run
consequently.
This commit improves the test, ensuring at least 2 goroutines were
started before increasing the counting.
On some routers, change to network may trigger re-rendering
/etc/resolv.conf file, causing requests from router itself stop using
ctrld.
Fixing this by watching changes to /etc/resolv.conf, then revert them.
For Android devices, when it joins the network, it uses ctrld to resolve
its private DNS once and never reaches ctrld again. For each time, it uses
a different IPv6 address, which causes hundreds/thousands different client
IDs created for the same device, which is pointless.
Generating nextdns config must happen after stopping current ctrld
process. Otherwise, config processing may pick wrong IP+Port.
While at it, also making logging better when updating listener config:
- Change warn to info, prevent confusing that "something is wrong".
- Do not emit info when generating working default config, which may
cause duplicated messages printed.
Otherwise, network changes may not be seen on some platforms, causing
ctrld failed to recover and failing all requests.
While at it, also doing the check DNS in separate goroutine, prevent it
from blocking ctrld from notifying others that it "started". The issue
was seen when ctrld is configured as direct listener, requests are
flooded before ctrld started, causing the healtch process failed.
The provision token is only used once, then do not have any effect after
Control D uid is fetched. So making it appears in "ctrld run" command is
useless.
VPN clients often have empty MAC address, because they come from virtual
network interface. However, there's other setup/devices also create
virtual interface, but is not VPN.
Changing source of those clients to empty to prevent confustion in
clients list command output.
RMM uses non-user account which results in config + socket file being
written to a random directory, which is not a real directory that can be
accessed.
Fix this by using directory of ctrld binary as user home dir.
Some users mentioned that when there is an Internet outage, ctrld fails
to recover, crashing or locks up the router. When requests start
failing, this results in the clients emitting more queries, creating a
resource spiral of death that can brick the device entirely.
To guard against this case, this commit implement an upstream monitor
approach:
- Marking upstream as down after 100 consecutive failed queries.
- Start a goroutine to check when the upstream is back again.
- When upstream is down, answer all queries with SERVFAIL.
- The checking process uses backoff retry to reduce high requests rate.
- As long as the query succeeded, marking the upstream as alive then
start operate normally.
Currently, ctrld assumes that NetworkManager is not available if writing
to /etc/NetworkManager/conf.d return directory not exist error. That
would work on most Linux distros. However, cloud provider may do some
hacks, causing ctrld confusion and think that NetworkManager is
available.
Fixing this by checking whether NetworkManager binary presents first.
While at it, also fixing a bug when restarting NetworkManager failed
causing ctrld hangs. The go-systemd library is not clear about this, but
the waitCh channel won't never be closed if error occurred, so we must
return immediately instead of receiving from it blindly.