Using netsh command will emit unexpected SOA queries, do not use it.
While at it, also ensure that local ipv6 will be added to nameservers
list on systems that require ipv6 local listener.
So changes to DNS after ctrld stopped won't be reverted by the goroutine
itself. The problem happens rarely on darwin, because networksetup
command won't propagate config to /etc/resolv.conf if there is no
changes between multiple running.
Without reading the documentation, users may think that "ctrld start"
will just start ctrld service. However, this is not the case, and may
lead to unexpected result from user's point of view.
This commit changes "ctrld start" to just start already installed ctrld
service, so users won't lost what they did installed before. If there
are any arguments specified, performing the current behavior.
At the time self-check process running, we have already known the exact
config file being used by ctrld service. Thus, we should just re-read
this config file directly instead of guessing the config file.
By making dnsFromAdapter ignores DNS server which is the same IP address
of the adapter.
While at it, also changes OS resolver to use ctrld bootstrap DNS only if
there's no available nameservers.
By attempting to reset DNS before starting new ctrld process. This way,
ctrld will read the correct system DNS settings before changing itself.
While at it, some optimizations are made:
- "ctrld start" won't set DNS anymore, since "ctrld run" has already did
this, start command could just query socket control server and emittin
proper message to users.
- The gateway won't be included as nameservers on Windows anymore,
since the GetAdaptersAddresses Windows API always returns the correct
DNS servers of the interfaces.
- The nameservers list that OS resolver is using will be shown during
ctrld startup, making it easier for debugging.
For safety reason, ctrld will create a backup of the current binary when
running upgrade command.
However, on systems where ctrld status is got by parsing ps command
output, the current binary path is important and must be the same with
the original binary. Depends on kernel version, using os.Executable may
return new backup binary path, aka "ctrld_previous", not the original
"ctrld" binary. This causes upgrade command see ctrld as not running
after restart -> upgrade failed.
Fixing this by recording the binary path before creating new service, so
the ctrld service status can be checked correctly.
The "ctrld start" command is running slow, and using much CPU than
necessary. The problem was made because of several things:
1. ctrld process is waiting for 5 seconds before marking listeners up.
That ends up adding those seconds to the self-check process, even
though the listeners may have been already available.
2. While creating socket control client, "s.Status()" is called to
obtain ctrld service status, so we could terminate early if the
service failed to run. However, that would make a lot of syscall in a
hot loop, eating the CPU constantly while the command is running. On
Windows, that call would become slower after each calls. The same
effect could be seen using Windows services manager GUI, by pressing
start/stop/restart button fast enough, we could see a timeout raised.
3. The socket control server is started lately, after all the listeners
up. That would make the loop for creating socket control client run
longer and use much resources than necessary.
Fixes for these problems are quite obvious:
1. Removing hard code 5 seconds waiting. NotifyStartedFunc is enough to
ensure that listeners are ready for accepting requests.
2. Check "s.Status()" only once before the loop. There has been already
30 seconds timeout, so if anything went wrong, the self-check process
could be terminated, and won't hang forever.
3. Starting socket control server earlier, so newSocketControlClient can
connect to server with fewest attempts, then querying "/started"
endpoint to ensure the listeners have been ready.
With these fixes, "ctrld start" now run much faster on modern machines,
taking ~1-2 seconds (previously ~5-8 seconds) to finish. On dual cores
VM, it takes ~5-8 seconds (previously a few dozen seconds or timeout).
---
While at it, there are two refactoring for making the code easier to
read/maintain:
- PersistentPreRun is now used in root command to init console logging,
so we don't have to initialize them in sub-commands.
- NotifyStartedFunc now use channel for synchronization, instead of a
mutex, making the ugly asymetric calls to lock goes away, making the
code more idiom, and theoretically have better performance.
This commit implements upgrade command which will:
- Download latest version for current running arch.
- Replacing the binary on disk.
- Self-restart ctrld service.
If the service does not start with new binary, old binary will be
restored and self-restart again.
On BSD, the service is made un-killable since v1.3.4 by using daemon
command "-r" option. However, when reading remote config, the ctrld will
fatally exit if the config is malformed. This causes daemon respawn new
ctrld process immediately, causing the "ctrld start" command hang
forever because of restart loop.
Since "ctrld start" already fetch the resolver config for validating
uid, it should validate the remote config, too. This allows better error
message printed to users, let them know that the config is invalid.
Further, if the remote config was invalid, we should disregard it and
generating the default working one in cd mode.
- short control socket name.(in IOS max length is 11)
- wait for control server to reply before checking for deactivation pin.
- Added separate name for control socket for mobile.
- Added stop channel reference to Control client constructor.
On BSD platform, using "daemon -r" may fool the status check that ctrld
is still running while it was terminated unexpectedly. This may cause
the check in newSocketControlClient hangs forever.
Using a sane timeout value of 30 seconds, which should be enough for the
ctrld service started in normal condition.
If the server is running old version of ctrld, the deactivation pin
check will return 404 not found, the client should consider this as no
error instead of returning invalid pin code.
This allows v1.3.5 binary `ctrld start` command while the ctrld server
is still running old version. I discover this while testing v1.3.5
binary on a router with old ctrld version running.
The loop is run after the main interface DNS was set, thus the error
would make noise to users. This commit removes the noise, by making
currentStaticDNS returns an additional error, so it's up to the caller
to decive whether to emit the error or not.
Further, the physical interface loop will now only log when the callback
function runs successfully. Emitting the callback error can be done in
the future, until we can figure out how to detect physical interfaces in
Go portably.