On some Merlin routers reported by users, ctrld some how is not stopped
properly. So the router does not have a working DNS at boot time to do
ntp synchronization.
To fix it, just clean up the router before start waiting for ntp ready.
Firewalla ignores 127.0.0.1 in all VLAN config, so making 127.0.0.1 as
dnsmasq upstream would break thing when multiple VLAN presents.
To deal with this, we need to gather all interfaces available, and
making them as upstream of dnsmasq. Then changing ctrld to listen on all
interfaces, too.
It also leads to better improvement for dnsmasq configuration template,
as the upstream server can now be generated dynamically instead of hard
coding to 127.0.0.1:5354.
On Firewalla, lo interface is excluded in all dnsmasq settings of all
interfaces, to prevent conflicts. The one that ctrld adds in
dnsmasq_local directory could not work if there're multiple dnsmasq
configs for multiple interfaces (real example from an user who uses
VLAN in router setup).
Instead, if we detect 127.0.0.1 on Firewalla, fallback to "br0"
interface IP address instead.
When running on routers, ctrld leverages default setup, let dnsmasq runs
on port 53, and forward queries to ctrld listener on port 5354. However,
this setup is not serialized to config file, causing confusion to users.
Fixing this by writing the correct routers setup to config file. While
at it, updating documentation to refelct that, and also adding note that
changing default router setup could break things.
UniFi Gateway (USG) uses its own DNS forwarding rule, which is
configured default in /etc/dnsmasq.conf file. Adding ctrld own config in
/etc/dnsmasq.d won't take effects. Instead, we must make changes
directly to /etc/dnsmasq.conf, configuring ctrld as the only upstream.
The current state of ctrld is very "high stakes" and easy to mess up,
and is unforgiving when "ctrld start" failed. That would cause the
router is in broken state, unrecoverable.
This commit makes these changes to improve the state:
- Moving router setup process after ctrld listeners are ready, so
dnsmasq won't flood requests to ctrld even though the listeners are
not ready to serve requests.
- On router, when ctrld stopped, restore router DNS setup. That leaves
the router in good state on reboot/startup, help removing the custom
DNS server for NTP synchronization on some routers.
- If self-check failed, uninstall ctrld to restore router to good
state, prevent confusion that ctrld process is still running even
though self-check reports it did not started.
On some platforms, like pfsense, ntpd is not problem, so do not spawn
the DNS server for it, which may conflict with default DNS server.
While at it, also make sure that ctrld will be run at last on startup.
On DD-WRT v3.0-r52189, dnsmasq version 2.89 lease format looks like:
1685794060 <mac> <ip> <hostname> 00:00:00:00:00:04 9
It has 6 fields, while the current parser only looks for line with exact
5 fields, which is too restricted. In fact, the parser shold just skip
line with less than 4 fields, because the 4th field is the hostname,
which is the last client info that ctrld needs.
Currently, on routers that require NTP waiting, ctrld makes the cleanup
process, and restart dnsmasq for restoring default DNS config, so ntpd
can query the NTP servers. It did work, but the code will depends on
router platforms.
Instead, we can spawn a plain DNS listener before PreRun on routers,
this listener will serve NTP dns queries and once ntp is configured, the
listener is terminated and ctrld will start serving using its configured
upstreams.
While at it, also fix the userHomeDir function on freshtomato, which
must return the binary directory for routers that requires JFFS.
The assignment is changed wrongly in process of refactoring parallel
dialer for resolving bootstrap IP.
While at it, also satisfy staticheck for jffs not enabled error.
On some Merlin routers, the time is broken when system reboot, and need
to wait for NTP synced to get the correct time. For fetching API in cd
mode successfully, ctrld need to wait until NTP set the time correctly,
otherwise, the certificate validation would complain.
On some Merlin routers, due to ntp bug, after rebooing, dnsmasq config
was restored to default without ctrld changes, causing ctrld stop
working. Workaround this problem by catching restart diskmon event,
which is triggered by ntpd_synced, then restart dnsmasq.
This commit add the ability for ctrld to gather client information,
including mac/ip/hostname, and send to Control-D server through a
config per upstream.
- Add send_client_info upstream config.
- Read/Watch dnsmasq leases files on supported platforms.
- Add corresponding client info to DoH query header
All of these only apply for Control-D upstream, though.
So we don't have to depend on network stack probing to decide whether
ipv4 or ipv6 will be used.
While at it, also prevent a race report when doing the same parallel
resolving for os resolver, even though this race is harmless.
The bootstrap process has two issues that can make ctrld stop resolving
after restarting machine host.
ctrld uses bootstrap DNS and os nameservers for resolving upstream. On
unix, /etc/resolv.conf content is used to get available nameservers.
This works well when installing ctrld. However, after being installed,
ctrld may modify the content of /etc/resolv.conf itself, to make other
apps use its listener as DNS resolver. So when ctrld starts after OS
restart, it ends up using [bootstrap DNS + ctrld's listener], for
resolving upstream. At this moment, if ctrld could not contact bootstrap
DNS for any reason, upstream domain will not be resolved.
For above reason, an upstream may not have bootstrap IPs after ctrld
starts. When re-bootstrapping, if there's no bootstrap IPs, ctrld should
call the setup bootstrap process again. Currently, it does not, causing
all queries failed.
This commit fixes above issue by adding mechanism for retrieving OS
nameservers properly, by querying routing table information:
- Parsing /proc/net subsystem on Linux.
- For BSD variants, just fetching routing information base from OS.
- On Windows, just include the gateway information when reading iface.
The fixing for second issue is trivial, just kickoff a bootstrap process
if there's no bootstrap IPs when re-boostrapping.
While at it, also ensure that fetching resolver information from
ControlD API is also used the same approach.
Fixes#34
For os resolver, ctrld queries against all servers concurrently, and get
the first success result back. However, if all server failed, the result
channel is not closed, causing ctrld hang.
Fixing this by closing the result channel once getting back all response
from servers.
While at it, also shorten the backoff time when waiting for network up,
ctrld should serve as fast as possible after network is available.
Updates #34