/ sysadmin

NetworkManager and failing DNS resolution for local domain names

I recently had to reinstall the OS on my linux workstation and noticed I was having some difficulties resolving hostnames for my local domain. The underlying problems turned out to be more complicated than expected, so I figured it was worth documenting. Before I continue though, it is worth noting that I didn't have my workstation on the network while installing xUbuntu, so configuration might have been automatically set up correctly if it were connection during installation.

The Problem

I run a bind nameserver for my homelab on 192.168.1.2. For the purposes of this post, we will say I'm using the domain name myhome.local. It was already confirmed on another host that the nameserver was configured correctly, and that DHCP was providing the correct Primary DNS address.

From my new OS, I was unable to resolve my internal hostnames without having to specify the nameserver to use. This only happened with the internal addresses- all external hostnames resolved perfectly fine. I reached this conclusion by the following line of troubleshooting.

Test: Verify that I can resolve an external hostname.
Conclusion: Name resolution works to an extent. I am querying a local nameserver for some reason though.

user@workstation:~$ nslookup www.google.com
Server:         127.0.1.1
Address:        127.0.1.1#53

Non-authoritative answer:
Name:	www.google.com
Address: 172.217.3.196

Test: Verify that I can't resolve an internal hostname.
Conclusion: The apparent nameserver on my workstation can't resolve the internal address. There's no indication that the hostname resolution process is reaching the proper nameserver located at 192.168.1.2.

user@workstation:~$ nslookup www.myhome.local
Server:         127.0.1.1
Address:        127.0.1.1#53

** server can't find www.myhome.local: NXDOMAIN

Test: Check if I can resolve a hostname by forcing the intended nameserver.
Conclusion: The nameserver works, confirming that my workstation isn't pointing to the correct nameserver.

user@workstation:~$ nslookup www.myhome.local 192.168.1.2
Server:         192.168.1.2
Address:        192.168.1.2#53

Name:	www.myhome.local
Address: 192.168.1.20

Test: Given the information I've seen, I shouldn't be able to ping the hostname, but I thought it was best to check (this test seems somewhat redundant, but is important later).
Conclusion: Ping failed to resolve the hostname as expected.

user@workstation:~$ ping www.myhome.local
ping: unknown host www.myhome.local

Test: Make sure the IP address for the web server works.
Conclusion: The host is reachable on the network via its IP. Technically this was not needed, but it's comforting to know the host exists during this existential crisis.

user@workstation:~$ ping 192.168.1.20
PING 192.168.1.20 (192.168.1.20) 56(84) bytes of data.
64 bytes from 192.168.1.20: icmp_seq=1 ttl=64 time=3.86 ms
^C

Okay, this was a bunch of tests. What facts do I know at this point? Let's go over our list:

  • There appears to be a nameserver running on my localhost (workstation)
  • The local nameserver can resolve external hostnames such as www.google.com.
  • The local nameserver cannot resolve internal hostnames such as www.myhome.local.
  • Internal hostnames are not resolving when you query the nameserver at 192.168.1.2 using nslookup.
  • Internal hostnames are not resolving when you attempt to ping the hostname.

Firstly, lets figure out what's up with this DNS server that appears to be running on my workstation.

Disable dnsmasq forwarding within NetworkManager

In short, I eventually found this stack exchange question which explained what is happening here. Many thanks to users 2707974 and Kaii for their answer to the posted question.

NetworkManager was configured to forward all DNS resolution to its own running instance of dnsmasq. It had dnsmasq configured simply to forward requests it received at 127.0.1.1 to whatever nameserver IP was provided by DHCP. This resulted in the /etc/resolv.conf file looking like the following:

user@workstation:~$ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.1.1

The solution lies within the main section of the NetworkManager config file located at /etc/NetworkManager/NetworkManager.conf. The dnsmasq forwarding feature can be disabled by removing or uncommenting the dns=dnsmasq line and then restarting the NetworkManager service. For example, my NetworkManager.conf file now looks as follows:

user@workstation:~$ cat /etc/NetworkManager/NetworkManager.conf 
[main]
plugins=ifupdown,keyfile,ofono

[ifupdown]
managed=false

To be on the safe side, I checked to make sure the resolv.conf file updated as expected:

user@workstation:~$ cat /etc/resolv.conf  
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.1.2

With the local nameserver disabled and out of the way, we can replay our series of troubleshooting steps. I'll abbreviate this and just show the name resolution and ping tests:

Test: Make sure I can resolve an internal address without specifying the nameserver to use.
Conclusion: The test works! I should be able to ping that host via hostname now right?

user@workstation:~$ nslookup www.myhome.local
Server:         192.168.1.2
Address:        192.168.1.2#53

Name:	www.myhome.local
Address: 192.168.1.20

Test: Attempt to ping the host using its hostname (that we know resolves correctly).
Conclusion: Ping failed to resolve the hostname, even though we should be querying the nameserver correctly. This was not expected.

user@workstation:~$ ping www.myhome.local
ping: unknown host www.myhome.local

The results of each command tell us the following:

  • There is no nameserver on the localhost.
  • Running nslookup will resolve external hostnames such as www.google.com.
  • Running nslookup will resolve internal hostnames such as www.myhome.local.
  • Internal hostnames are not resolving when you attempt to ping the hostname.

This is where stuff got a little odd. My /etc/resolve.conf file says I'm querying the internal nameserver now, so why isn't www.myhome.local resolving? This is when I learned more about Name Service Switch (NSS).

Alter DNS priority for hostname resolution in NSS

Like before, I eventually found an explanation for pings strange behavior. As this stack exchange answer explains, ping uses the nss configuration rather than immediately using the /etc/resolve.conf file. This means there's something wrong with the hostname configuration within the nsswitch.conf file.

My nsswitch.conf file looked almost exactly like the lines that were posted in that solution (as far as the placement of the "DNS" part goes).

user@workstation:~$ cat /etc/nsswitch.conf | grep hosts
hosts:          files mdns4_minimal [NOTFOUND=return] dns

Initially I moved the "DNS" entry between the mdns4_minimal and [NOTFOUND=return] blocks, but that resulted in internal hostname resolution times of about 1 second, which wasn't quite fast enough (#NeedForSpeed). I then moved it forward between the files and mdns4_minimal entries like so:

user@workstation:~$ cat /etc/nsswitch.conf | grep hosts
hosts:          files dns mdns4_minimal [NOTFOUND=return]

Conclusion

We don't have to retest everything since it was just ping that was having problems. But a quick series of tests indicates that hostname resolution is completely working as expected:

user@workstation:~$ nslookup www.google.com
Server:         192.168.1.2
Address:        192.168.1.2#53

Non-authoritative answer:
Name:	www.google.com
Address: 172.217.3.196

user@workstation:~$ nslookup www.myhome.local
Server:         192.168.1.2
Address:        192.168.1.2#53

Name:	www.myhome.local
Address: 192.168.1.20

user@workstation:~$ ping www.myhome.local
PING www.myhome.local (192.168.1.20) 56(84) bytes of data.
64 bytes from www.myhome.local (192.168.1.20): icmp_seq=1 ttl=64 time=3.40 ms
64 bytes from www.myhome.local (192.168.1.20): icmp_seq=2 ttl=64 time=5.73 ms
^C

With the DNS issues fixed, we're clear to continue work as planned!