The default gateway is usually the DNS server in normal home network
setup for most users. However, there's case that it is not, causing
discover ptr failed.
This commit add discover_ptr_endpoints config parameter, so users can
define what DNS nameservers will be used.
Using the same approach as in cd mode, but do it only once when running
ctrld the first time, then the config will be re-used then.
While at it, also adding Dockerfile.debug for better troubleshooting
with alpine base image.
Config fetching/generating in cd mode is currently weird, error prone,
and easy for user to break ctrld when using custom config.
This commit reworks the flow:
- Fetching config from Control D API.
- No custom config, use the current default config.
- If custom config presents, but there's no listener, use 0.0.0.0:53.
- Try listening on current ip+port config, if ok, ctrld could be a
direct listener with current setup, moving on.
- If failed, trying 127.0.0.1:53.
- If failed, trying current ip + port 5354
- If still failed, pick a random ip:port pair, retry until listening ok.
With this flow, thing is more predictable/stable, and help removing the
Config interface for router.
dohTransport returns a http.RoundTripper. When pinging upstream, we do
it both for doh and doh3, and checking whether the transport is nil
before performing the check.
However, dohTransport returns a concrete *http.Transport. Thus
dohTransport will always return a non-nil http.Roundtripper, causing
invalid memory dereference when upstream is configured to use doh3.
Performing ping upstream separately will fix the issue.
The current transport setup is using mutex lock for synchronization.
This could work ok in normal device, but on low capacity routers, this
high contention may affect the performance, causing ctrld hangs.
Instead of using mutex lock, using atomic operation for synchronization
yield a better performance:
- There's no lock, so other requests won't be blocked. And even theses
requests use old broken transport, it would be fine, because the
client will retry them later.
- The setup transport is now done once, on demand when the transport is
accessed, or when signal rebootsrapping. The first call to
dohTransport will block others, but the transport is warmup before
ctrld start serving requests, so client requests won't be affected.
That helps ctrld handling the requests better when running on low
capacity device.
Further more, the transport configuration is also tweaked for better
default performance:
- MaxIdleConnsPerHost is set to 100 (default is 2), which allows more
connections to be reused, reduce the load to open/close connections
on demand. See [1] for a real example.
- Due to the raising of MaxIdleConnsPerHost, once the transport is
GC-ed, it must explicitly close its idle connections.
- TLS client session cache is now enabled.
Last but not least, the upstream ping process is also reworked. DoH
transport is an HTTP transport, so doing a HEAD request is enough to
warmup the transport, instead of doing a full DNS query.
[1]: https://gitlab.com/gitlab-org/gitlab-pages/-/merge_requests/274
Currently, there's no upper bound for how many requests that ctrld will
handle at a time. This could be problem on some low capacity routers,
where CPU/RAM is very limited.
This commit adds a configuration to limit how many requests that will be
handled concurrently. The default is 256, which should works well for
most routers (the default concurrent requests of dnsmasq is 150).
In split mode, the code must check for ipv6 availability to return the
correct network stack. Otherwise, we may end up using "tcp6-tls" even
though the upstream IP is an ipv4.
When bootstrapping, if the network changed, for example, firewall rules
changed during VPN connection, the bootstrap IPs may not be resolved, so
ctrld won't work. Since bootstrap IPs is necessary for ctrld to work
properly, we should wait until we can resolve upstream IP before we can
start serving requests.
This commit add the ability for ctrld to gather client information,
including mac/ip/hostname, and send to Control-D server through a
config per upstream.
- Add send_client_info upstream config.
- Read/Watch dnsmasq leases files on supported platforms.
- Add corresponding client info to DoH query header
All of these only apply for Control-D upstream, though.
So we don't have to depend on network stack probing to decide whether
ipv4 or ipv6 will be used.
While at it, also prevent a race report when doing the same parallel
resolving for os resolver, even though this race is harmless.
The bootstrap process has two issues that can make ctrld stop resolving
after restarting machine host.
ctrld uses bootstrap DNS and os nameservers for resolving upstream. On
unix, /etc/resolv.conf content is used to get available nameservers.
This works well when installing ctrld. However, after being installed,
ctrld may modify the content of /etc/resolv.conf itself, to make other
apps use its listener as DNS resolver. So when ctrld starts after OS
restart, it ends up using [bootstrap DNS + ctrld's listener], for
resolving upstream. At this moment, if ctrld could not contact bootstrap
DNS for any reason, upstream domain will not be resolved.
For above reason, an upstream may not have bootstrap IPs after ctrld
starts. When re-bootstrapping, if there's no bootstrap IPs, ctrld should
call the setup bootstrap process again. Currently, it does not, causing
all queries failed.
This commit fixes above issue by adding mechanism for retrieving OS
nameservers properly, by querying routing table information:
- Parsing /proc/net subsystem on Linux.
- For BSD variants, just fetching routing information base from OS.
- On Windows, just include the gateway information when reading iface.
The fixing for second issue is trivial, just kickoff a bootstrap process
if there's no bootstrap IPs when re-boostrapping.
While at it, also ensure that fetching resolver information from
ControlD API is also used the same approach.
Fixes#34
We only need on demand information when re-bootstrapping. On Bootsrap,
this is already checked by ctrldnet.Up, so on demand query will cause
un-necessary slow down if external ipv6 is slow to response.