This change improves compatibility with newer UniFi OS versions while
maintaining backward compatibility with UniFi OS 4.2 and earlier.
The refactoring also reduces code duplication and improves maintainability
by centralizing dnsmasq configuration path logic.
Generally, using /jffs/scripts/dnsmasq.postconf is the right way to add
custom configuration to dnsmasq on Merlin. However, we have seen many
reports that the postconf does not work on their devices.
This commit changes how dnsmasq config manipulation is done on Merlin,
so it's expected to work on all Merlin devices:
- Writing /jffs/scripts/dnsmasq.postconf script
- Copy current dnsmasq.conf to /jffs/configs/dnsmasq.conf
- Run postconf script directly on /jffs/configs/dnsmasq.conf
- Restart dnsmasq
This way, the /jffs/configs/dnsmasq.conf will contain both current
dnsmasq config, and also custom config added by ctrld, without worrying
about conflicting, because configuration was added by postconf.
See (1) for more details about custom config files on Merlin.
(1) https://github.com/RMerl/asuswrt-merlin.ng/wiki/Custom-config-files
When ctrld performs upgrading tasks, the current binary would be moved
to different file, thus the executable will return this new file name,
instead of the old "/path/to/ctrld".
The config path on FreshTomato is located in the same directory with
ctrld binary, with ".startup" suffix. So when the binary was moved
during upgrading, the config path is located wrongly.
To fix it, read the binary path from service config first, then only
fallback to the current executable if the path is empty (this is the
same way ctrld is doing for other router platforms).
openwrt 24.10 changes the dnsmasq default config path, causing breaking
changes to softwares which depends on old behavior.
This commit adds a workaround for the issue, by querying the actual
config directory from ubus service list, instead of relying on the
default hardcode one.
Because ctrld needs to query custom client mapping from it.
While at it, also make the error message clearer when initializing ubios
discover failed, by attaching the command output to returned error.
For safety reason, ctrld will create a backup of the current binary when
running upgrade command.
However, on systems where ctrld status is got by parsing ps command
output, the current binary path is important and must be the same with
the original binary. Depends on kernel version, using os.Executable may
return new backup binary path, aka "ctrld_previous", not the original
"ctrld" binary. This causes upgrade command see ctrld as not running
after restart -> upgrade failed.
Fixing this by recording the binary path before creating new service, so
the ctrld service status can be checked correctly.
Since ctrld now supports MAC rules, the client's mac and ip must always
be sent to ctrld. Otherwise, the mac policy won't work when ctrld is an
upstream of dnsmasq.
On some routers, dnsmasq config may change cache-size dynamically after
ctrld starts, causing dnsmasq crashes.
Fixing this by using max-cache-ttl, which have the same effect with
setting cache-size=0 but won't conflict with existing routers config.
The dnsmasq cache-size setting on EdgeOS could be re-generated anytime
by vyatta router/dhcp components. This conflicts with setting generated
by ctrld, causing dnsmasq fails to start.
It's better to keep dnsmasq cache enabled on EdgeOS, we can turn it off
again once we find a reliable way to control cache-size setting.
The postconf script added by ctrld requires all of these conditions to
work correctly:
- /proc, /tmp were mounted.
- dnsmasq is running.
Currently, ctrld is only waiting for NTP ready, which may not ensure
both of those conditions are true. Explicitly checking those conditions
is a safer approach.
So ctrld can record the raw/original client IP instead of looking up
from MAC to IP, which may not the right choice in some network setup
like using wireguard/vpn on Merlin router.
The only reason that forces ctrld to depend on vyatta-dhcpd service on
EdgeOS is allowing ctrld to watch lease files properly, because those
files may not be created at the time client info table initialized.
However, on some EdgeOS version, vyatta-dhcpd could not start with an
empty config file, causing restart loop itself, flooding systemd log,
making the router run out of memory.
To fix this, instead of depending on vyatta-dhcpd, we should just watch
for lease files creation, then adding them to watch list.
While at it, also making ctrld starts after nss-lookup, ensuring we have
a working DNS before starting ctrld.
On routers where we want to wait for NTP by checking nvram key. Before
waiting, we clean up the router to ensure it's restored to original
state. However, router.Cleanup is not idempotent, causing dnsmasq
restarted. On tomato/ddwrt, restarting have no delay, and spawning new
dnsmasq process immediately. On merlin, somehow it takes time to spawn
new dnsmasq process, causing ctrld wrongly think there's no one
listening on port 53.
Fixing this by ensuring router.Cleanup is idempotent. While at it, also
adding "ntp_done" to nvram key, which is now using on latest ddwrt.
For local config, we don't want to alter what user explicitly set, and
only try filling in missing value.
While at it, also remove the dnsmasq port delete on openwrt, we don't
need that hack anymore.
Config fetching/generating in cd mode is currently weird, error prone,
and easy for user to break ctrld when using custom config.
This commit reworks the flow:
- Fetching config from Control D API.
- No custom config, use the current default config.
- If custom config presents, but there's no listener, use 0.0.0.0:53.
- Try listening on current ip+port config, if ok, ctrld could be a
direct listener with current setup, moving on.
- If failed, trying 127.0.0.1:53.
- If failed, trying current ip + port 5354
- If still failed, pick a random ip:port pair, retry until listening ok.
With this flow, thing is more predictable/stable, and help removing the
Config interface for router.
On some Merlin routers reported by users, ctrld some how is not stopped
properly. So the router does not have a working DNS at boot time to do
ntp synchronization.
To fix it, just clean up the router before start waiting for ntp ready.
Firewalla ignores 127.0.0.1 in all VLAN config, so making 127.0.0.1 as
dnsmasq upstream would break thing when multiple VLAN presents.
To deal with this, we need to gather all interfaces available, and
making them as upstream of dnsmasq. Then changing ctrld to listen on all
interfaces, too.
It also leads to better improvement for dnsmasq configuration template,
as the upstream server can now be generated dynamically instead of hard
coding to 127.0.0.1:5354.
On Firewalla, lo interface is excluded in all dnsmasq settings of all
interfaces, to prevent conflicts. The one that ctrld adds in
dnsmasq_local directory could not work if there're multiple dnsmasq
configs for multiple interfaces (real example from an user who uses
VLAN in router setup).
Instead, if we detect 127.0.0.1 on Firewalla, fallback to "br0"
interface IP address instead.
When running on routers, ctrld leverages default setup, let dnsmasq runs
on port 53, and forward queries to ctrld listener on port 5354. However,
this setup is not serialized to config file, causing confusion to users.
Fixing this by writing the correct routers setup to config file. While
at it, updating documentation to refelct that, and also adding note that
changing default router setup could break things.
UniFi Gateway (USG) uses its own DNS forwarding rule, which is
configured default in /etc/dnsmasq.conf file. Adding ctrld own config in
/etc/dnsmasq.d won't take effects. Instead, we must make changes
directly to /etc/dnsmasq.conf, configuring ctrld as the only upstream.
The current state of ctrld is very "high stakes" and easy to mess up,
and is unforgiving when "ctrld start" failed. That would cause the
router is in broken state, unrecoverable.
This commit makes these changes to improve the state:
- Moving router setup process after ctrld listeners are ready, so
dnsmasq won't flood requests to ctrld even though the listeners are
not ready to serve requests.
- On router, when ctrld stopped, restore router DNS setup. That leaves
the router in good state on reboot/startup, help removing the custom
DNS server for NTP synchronization on some routers.
- If self-check failed, uninstall ctrld to restore router to good
state, prevent confusion that ctrld process is still running even
though self-check reports it did not started.