On some platforms, like pfsense, ntpd is not problem, so do not spawn
the DNS server for it, which may conflict with default DNS server.
While at it, also make sure that ctrld will be run at last on startup.
On DD-WRT v3.0-r52189, dnsmasq version 2.89 lease format looks like:
1685794060 <mac> <ip> <hostname> 00:00:00:00:00:04 9
It has 6 fields, while the current parser only looks for line with exact
5 fields, which is too restricted. In fact, the parser shold just skip
line with less than 4 fields, because the 4th field is the hostname,
which is the last client info that ctrld needs.
Currently, on routers that require NTP waiting, ctrld makes the cleanup
process, and restart dnsmasq for restoring default DNS config, so ntpd
can query the NTP servers. It did work, but the code will depends on
router platforms.
Instead, we can spawn a plain DNS listener before PreRun on routers,
this listener will serve NTP dns queries and once ntp is configured, the
listener is terminated and ctrld will start serving using its configured
upstreams.
While at it, also fix the userHomeDir function on freshtomato, which
must return the binary directory for routers that requires JFFS.
In commit 670879d1, the backoff is changed to be passed a real error,
instead of a place holder. However, the test query may return a failed
response with a nil error, causing the backoff never fire.
Fixing this by ensuring the error is wrapped, so the backoff always see
a non-nil error.
On WSL 1, the routing table do not contain default route, causing ctrld
failed to get the default iface for setting DNS. However, WSL 1 only use
/etc/resolv.conf for setting up DNS, so the interface does not matter,
because the setting is applied global anyway.
To fix it, just return "lo" as the default interface name on WSL 1.
While at it, also removing the useless service.Logger call, which is not
unified with the current logger, and may cause false positive on system
where syslog is not configured properly (like WSL 1).
Also passing the real error when doing sel-check to backoff, so we don't
have to use a place holder error.
In split mode, the code must check for ipv6 availability to return the
correct network stack. Otherwise, we may end up using "tcp6-tls" even
though the upstream IP is an ipv4.
The assignment is changed wrongly in process of refactoring parallel
dialer for resolving bootstrap IP.
While at it, also satisfy staticheck for jffs not enabled error.
On some Merlin routers, the time is broken when system reboot, and need
to wait for NTP synced to get the correct time. For fetching API in cd
mode successfully, ctrld need to wait until NTP set the time correctly,
otherwise, the certificate validation would complain.
On some Merlin routers, due to ntp bug, after rebooing, dnsmasq config
was restored to default without ctrld changes, causing ctrld stop
working. Workaround this problem by catching restart diskmon event,
which is triggered by ntpd_synced, then restart dnsmasq.
When bootstrapping, if the network changed, for example, firewall rules
changed during VPN connection, the bootstrap IPs may not be resolved, so
ctrld won't work. Since bootstrap IPs is necessary for ctrld to work
properly, we should wait until we can resolve upstream IP before we can
start serving requests.
Instead of always doubling the request, first we wrap the request with a
failover timeout, 500ms, which is an average time for a normal request.
If this request failed, trigger re-bootstrapping and retry the request.
When network changes, for example: connect/disconnect VPN, the old
connection will become broken, but still can be re-used for new
requests. That would cause un-necessary delay for ctrld clients:
- Time 0 - do request with broken transport, 5s timeout.
- Time 0.5 - network stack become usable.
- Time 5 - timeout reached.
- Time 5.1 - do request with new transport -> success.
Instead, we can do two requests in parallel, with the failover one using
a fresh new transport. So if the main one is broken, we still can get
the result from the failover one.