The current state of ctrld is very "high stakes" and easy to mess up,
and is unforgiving when "ctrld start" failed. That would cause the
router is in broken state, unrecoverable.
This commit makes these changes to improve the state:
- Moving router setup process after ctrld listeners are ready, so
dnsmasq won't flood requests to ctrld even though the listeners are
not ready to serve requests.
- On router, when ctrld stopped, restore router DNS setup. That leaves
the router in good state on reboot/startup, help removing the custom
DNS server for NTP synchronization on some routers.
- If self-check failed, uninstall ctrld to restore router to good
state, prevent confusion that ctrld process is still running even
though self-check reports it did not started.
The current transport setup is using mutex lock for synchronization.
This could work ok in normal device, but on low capacity routers, this
high contention may affect the performance, causing ctrld hangs.
Instead of using mutex lock, using atomic operation for synchronization
yield a better performance:
- There's no lock, so other requests won't be blocked. And even theses
requests use old broken transport, it would be fine, because the
client will retry them later.
- The setup transport is now done once, on demand when the transport is
accessed, or when signal rebootsrapping. The first call to
dohTransport will block others, but the transport is warmup before
ctrld start serving requests, so client requests won't be affected.
That helps ctrld handling the requests better when running on low
capacity device.
Further more, the transport configuration is also tweaked for better
default performance:
- MaxIdleConnsPerHost is set to 100 (default is 2), which allows more
connections to be reused, reduce the load to open/close connections
on demand. See [1] for a real example.
- Due to the raising of MaxIdleConnsPerHost, once the transport is
GC-ed, it must explicitly close its idle connections.
- TLS client session cache is now enabled.
Last but not least, the upstream ping process is also reworked. DoH
transport is an HTTP transport, so doing a HEAD request is enough to
warmup the transport, instead of doing a full DNS query.
[1]: https://gitlab.com/gitlab-org/gitlab-pages/-/merge_requests/274
Currently, there's no upper bound for how many requests that ctrld will
handle at a time. This could be problem on some low capacity routers,
where CPU/RAM is very limited.
This commit adds a configuration to limit how many requests that will be
handled concurrently. The default is 256, which should works well for
most routers (the default concurrent requests of dnsmasq is 150).
The openwrt version in old GL-inet devices do not support checking
status using /etc/init.d/<service_name>, so the sysV wrapping trick
won't work. Instead, we need to parse "ps" command output to check
whether ctrld process is running or not.
While at it, making newService as a wrapper of service.New function,
prevent the caller from calling the latter without following call to
the former, causing mismatch in service operations.
On some platforms, like pfsense, ntpd is not problem, so do not spawn
the DNS server for it, which may conflict with default DNS server.
While at it, also make sure that ctrld will be run at last on startup.
Currently, on routers that require NTP waiting, ctrld makes the cleanup
process, and restart dnsmasq for restoring default DNS config, so ntpd
can query the NTP servers. It did work, but the code will depends on
router platforms.
Instead, we can spawn a plain DNS listener before PreRun on routers,
this listener will serve NTP dns queries and once ntp is configured, the
listener is terminated and ctrld will start serving using its configured
upstreams.
While at it, also fix the userHomeDir function on freshtomato, which
must return the binary directory for routers that requires JFFS.
In commit 670879d1, the backoff is changed to be passed a real error,
instead of a place holder. However, the test query may return a failed
response with a nil error, causing the backoff never fire.
Fixing this by ensuring the error is wrapped, so the backoff always see
a non-nil error.
On WSL 1, the routing table do not contain default route, causing ctrld
failed to get the default iface for setting DNS. However, WSL 1 only use
/etc/resolv.conf for setting up DNS, so the interface does not matter,
because the setting is applied global anyway.
To fix it, just return "lo" as the default interface name on WSL 1.
While at it, also removing the useless service.Logger call, which is not
unified with the current logger, and may cause false positive on system
where syslog is not configured properly (like WSL 1).
Also passing the real error when doing sel-check to backoff, so we don't
have to use a place holder error.
On some Merlin routers, the time is broken when system reboot, and need
to wait for NTP synced to get the correct time. For fetching API in cd
mode successfully, ctrld need to wait until NTP set the time correctly,
otherwise, the certificate validation would complain.
This reverts commit 00fe7f59d13774f2ea6c325bdbb8165be58a1edd.
The purpose is disable cd mode for already installed service, which is
a hard problem than we thought. So leave it out of v1.2 cycle.
When writing default config file, the content must be marshalled to the
config object first before writing to disk.
While at it, also use full path for default config file to make it clear
to the user where the config is written.
This commit add the ability for ctrld to gather client information,
including mac/ip/hostname, and send to Control-D server through a
config per upstream.
- Add send_client_info upstream config.
- Read/Watch dnsmasq leases files on supported platforms.
- Add corresponding client info to DoH query header
All of these only apply for Control-D upstream, though.
So we don't have to depend on network stack probing to decide whether
ipv4 or ipv6 will be used.
While at it, also prevent a race report when doing the same parallel
resolving for os resolver, even though this race is harmless.
Otherwise, we experiment with ctrld slow start after rebooting, because
the network check continuously report failed status even the network
state is up. Restoring the DNS before stopping, we leave the network
state as default, as long as ctrld starts, the DNS is configured again.