Currently, ctrld watches changes to /etc/resolv.conf file, then
reverting to the expected settings. However, if /etc/resolv.conf is a
symlink, changes made to the target file maynot be seen if it's not
under /etc directory.
To fix this, just evaluate the /etc/resolv.conf file before watching it.
Once per minute, ctrld will check if DNS settings was changed or not. If
yes, re-applying the proper settings for system interfaces.
For now, this is only applied when deactivation_pin was set.
Since the OS resolver only returns response with NOERROR first, it's
safe to use ControlD public DNS in parallel with system DNS. Local
domains would resolve only though local resolvers, because public ones
will return NXDOMAIN response.
Once resource record (RR) was used to extract necessary information, it
should be freed in memory. However, the current way that ctrld declare
the RRs causing the slices to be heap allocated, and stay in memory
longer than necessary. On system with low capacity, or firmware that GC
does not run agressively, it may causes the system memory exhausted.
To fix it, prevent RRs to be heap allocated, so they could be freed
immediately after each iterations.
If we see permission denied error when probing dns, that mean the
current ctrld process won't be able to do that anyway. So the probing
loop must be terminated to prevent waste of resources, or false positive
from system firewall because of too many failed attempts.
By making dnsFromAdapter ignores DNS server which is the same IP address
of the adapter.
While at it, also changes OS resolver to use ctrld bootstrap DNS only if
there's no available nameservers.
By attempting to reset DNS before starting new ctrld process. This way,
ctrld will read the correct system DNS settings before changing itself.
While at it, some optimizations are made:
- "ctrld start" won't set DNS anymore, since "ctrld run" has already did
this, start command could just query socket control server and emittin
proper message to users.
- The gateway won't be included as nameservers on Windows anymore,
since the GetAdaptersAddresses Windows API always returns the correct
DNS servers of the interfaces.
- The nameservers list that OS resolver is using will be shown during
ctrld startup, making it easier for debugging.
For safety reason, ctrld will create a backup of the current binary when
running upgrade command.
However, on systems where ctrld status is got by parsing ps command
output, the current binary path is important and must be the same with
the original binary. Depends on kernel version, using os.Executable may
return new backup binary path, aka "ctrld_previous", not the original
"ctrld" binary. This causes upgrade command see ctrld as not running
after restart -> upgrade failed.
Fixing this by recording the binary path before creating new service, so
the ctrld service status can be checked correctly.
It seems to be a Windows bug when removing a forwarder and adding a new
one immediately then causing both of them to be added to forwarders
list. This could be verified easily using powershell commands.
Since the forwarder will be removed when ctrld stop/uninstall, ctrld run
could avoid that action, not only help mitigate above bug, but also not
waste host resources.
On low resources Windows Server VM, profiling shows the bottle neck when
interacting with Windows DNS server to add/remove forwarders using by
calling external powershell commands. This happens because ctrld try
setting DNS before it runs.
However, it would be better if ctrld only sets DNS after all its
listeners ready. So it won't block ctrld from receiving requests.
With this change, self-check process on dual Core Windows server VM now
runs constantly fast, ~2-4 seconds when running multiple times in a row.
The "ctrld start" command is running slow, and using much CPU than
necessary. The problem was made because of several things:
1. ctrld process is waiting for 5 seconds before marking listeners up.
That ends up adding those seconds to the self-check process, even
though the listeners may have been already available.
2. While creating socket control client, "s.Status()" is called to
obtain ctrld service status, so we could terminate early if the
service failed to run. However, that would make a lot of syscall in a
hot loop, eating the CPU constantly while the command is running. On
Windows, that call would become slower after each calls. The same
effect could be seen using Windows services manager GUI, by pressing
start/stop/restart button fast enough, we could see a timeout raised.
3. The socket control server is started lately, after all the listeners
up. That would make the loop for creating socket control client run
longer and use much resources than necessary.
Fixes for these problems are quite obvious:
1. Removing hard code 5 seconds waiting. NotifyStartedFunc is enough to
ensure that listeners are ready for accepting requests.
2. Check "s.Status()" only once before the loop. There has been already
30 seconds timeout, so if anything went wrong, the self-check process
could be terminated, and won't hang forever.
3. Starting socket control server earlier, so newSocketControlClient can
connect to server with fewest attempts, then querying "/started"
endpoint to ensure the listeners have been ready.
With these fixes, "ctrld start" now run much faster on modern machines,
taking ~1-2 seconds (previously ~5-8 seconds) to finish. On dual cores
VM, it takes ~5-8 seconds (previously a few dozen seconds or timeout).
---
While at it, there are two refactoring for making the code easier to
read/maintain:
- PersistentPreRun is now used in root command to init console logging,
so we don't have to initialize them in sub-commands.
- NotifyStartedFunc now use channel for synchronization, instead of a
mutex, making the ugly asymetric calls to lock goes away, making the
code more idiom, and theoretically have better performance.
The systemd-networkd-wait-online is only required if systemd-networkd
is managing any interfaces. Otherwise, it will hang and block ctrld from
starting.
See: https://github.com/systemd/systemd/issues/23304