On routers where we want to wait for NTP by checking nvram key. Before
waiting, we clean up the router to ensure it's restored to original
state. However, router.Cleanup is not idempotent, causing dnsmasq
restarted. On tomato/ddwrt, restarting have no delay, and spawning new
dnsmasq process immediately. On merlin, somehow it takes time to spawn
new dnsmasq process, causing ctrld wrongly think there's no one
listening on port 53.
Fixing this by ensuring router.Cleanup is idempotent. While at it, also
adding "ntp_done" to nvram key, which is now using on latest ddwrt.
For local config, we don't want to alter what user explicitly set, and
only try filling in missing value.
While at it, also remove the dnsmasq port delete on openwrt, we don't
need that hack anymore.
Config fetching/generating in cd mode is currently weird, error prone,
and easy for user to break ctrld when using custom config.
This commit reworks the flow:
- Fetching config from Control D API.
- No custom config, use the current default config.
- If custom config presents, but there's no listener, use 0.0.0.0:53.
- Try listening on current ip+port config, if ok, ctrld could be a
direct listener with current setup, moving on.
- If failed, trying 127.0.0.1:53.
- If failed, trying current ip + port 5354
- If still failed, pick a random ip:port pair, retry until listening ok.
With this flow, thing is more predictable/stable, and help removing the
Config interface for router.
Depending on system, the output of `/etc/init.d/ctrld status` can be
either "Running" or "running", we must do in-sensitive comparison to get
the right status of ctrld.
The current error message is not much helpful, not all users are able to
investigate system log file to find the reason.
Instead, gathering the log output of "ctrld run" command, and if error
happens or self-check failed, print the log to users.
On some Merlin routers reported by users, ctrld some how is not stopped
properly. So the router does not have a working DNS at boot time to do
ntp synchronization.
To fix it, just clean up the router before start waiting for ntp ready.
Firewalla ignores 127.0.0.1 in all VLAN config, so making 127.0.0.1 as
dnsmasq upstream would break thing when multiple VLAN presents.
To deal with this, we need to gather all interfaces available, and
making them as upstream of dnsmasq. Then changing ctrld to listen on all
interfaces, too.
It also leads to better improvement for dnsmasq configuration template,
as the upstream server can now be generated dynamically instead of hard
coding to 127.0.0.1:5354.
On Ubuntu 18.04 VM with some cloud provider, using dbus call to set DNS
is forbidden. A possible solution is stopping networkd entirely then
using systemd-resolve to set DNS when ctrld starts.
While at it, only set DNS during start command on Windows. On other
platforms, "ctrld run" does set DNS in service mode already.
When using systemd-resolved, only change listener address to default
route interface address if a loopback address is used.
Also fixing a bug in upstream tailscale code for checking in container.
See tailscale/tailscale#8444
On Firewalla, lo interface is excluded in all dnsmasq settings of all
interfaces, to prevent conflicts. The one that ctrld adds in
dnsmasq_local directory could not work if there're multiple dnsmasq
configs for multiple interfaces (real example from an user who uses
VLAN in router setup).
Instead, if we detect 127.0.0.1 on Firewalla, fallback to "br0"
interface IP address instead.