VPN clients often have empty MAC address, because they come from virtual
network interface. However, there's other setup/devices also create
virtual interface, but is not VPN.
Changing source of those clients to empty to prevent confustion in
clients list command output.
RMM uses non-user account which results in config + socket file being
written to a random directory, which is not a real directory that can be
accessed.
Fix this by using directory of ctrld binary as user home dir.
Some users mentioned that when there is an Internet outage, ctrld fails
to recover, crashing or locks up the router. When requests start
failing, this results in the clients emitting more queries, creating a
resource spiral of death that can brick the device entirely.
To guard against this case, this commit implement an upstream monitor
approach:
- Marking upstream as down after 100 consecutive failed queries.
- Start a goroutine to check when the upstream is back again.
- When upstream is down, answer all queries with SERVFAIL.
- The checking process uses backoff retry to reduce high requests rate.
- As long as the query succeeded, marking the upstream as alive then
start operate normally.
Currently, ctrld assumes that NetworkManager is not available if writing
to /etc/NetworkManager/conf.d return directory not exist error. That
would work on most Linux distros. However, cloud provider may do some
hacks, causing ctrld confusion and think that NetworkManager is
available.
Fixing this by checking whether NetworkManager binary presents first.
While at it, also fixing a bug when restarting NetworkManager failed
causing ctrld hangs. The go-systemd library is not clear about this, but
the waitCh channel won't never be closed if error occurred, so we must
return immediately instead of receiving from it blindly.
So ctrld can record the raw/original client IP instead of looking up
from MAC to IP, which may not the right choice in some network setup
like using wireguard/vpn on Merlin router.
The current approach to get default route IP is finding the LAN
interface with the same MAC address. However, there could be multiple
interfaces like that, making ctrld confused.
This commit fixes this issue, by listing all possible private IPs, then
sorting them and use the smallest one for router self queries.
For reporting router queries, ctrld uses private IP of the default route
interface. However, when the default route is conntected directly to
ISP, the interface will have a public IP, and another interface with the
same MAC address will be created for LAN ip. So when no private IP found
for default route interface, ctrld must look at the other interface to
find the corret LAN ip.
The only reason that forces ctrld to depend on vyatta-dhcpd service on
EdgeOS is allowing ctrld to watch lease files properly, because those
files may not be created at the time client info table initialized.
However, on some EdgeOS version, vyatta-dhcpd could not start with an
empty config file, causing restart loop itself, flooding systemd log,
making the router run out of memory.
To fix this, instead of depending on vyatta-dhcpd, we should just watch
for lease files creation, then adding them to watch list.
While at it, also making ctrld starts after nss-lookup, ensuring we have
a working DNS before starting ctrld.
Using the same approach as in cd mode, but do it only once when running
ctrld the first time, then the config will be re-used then.
While at it, also adding Dockerfile.debug for better troubleshooting
with alpine base image.
- Watch more events for lease file changes
- Improving network up detection by using bootstrap IPv6 along side
IPv4 one.
- Emitting log to notice user that ctrld is starting.
- Using systemd wrapper to provide correct status.
- Restoring DNS on stop on Windows.
On routers where we want to wait for NTP by checking nvram key. Before
waiting, we clean up the router to ensure it's restored to original
state. However, router.Cleanup is not idempotent, causing dnsmasq
restarted. On tomato/ddwrt, restarting have no delay, and spawning new
dnsmasq process immediately. On merlin, somehow it takes time to spawn
new dnsmasq process, causing ctrld wrongly think there's no one
listening on port 53.
Fixing this by ensuring router.Cleanup is idempotent. While at it, also
adding "ntp_done" to nvram key, which is now using on latest ddwrt.