We're seeing occasional DNS broadcast failures (IP address showing instead of hostname) on DtD links of perhaps a half dozen nodes. Sometimes they'll go away after several hours or several days, sometimes they persist. (See attached screenshot).
All of the affected nodes are running the latest production release code. They're all high-level nodes, all Rockets (which may not be relevant).
Any ideas?
Orv W6BI
All of the affected nodes are running the latest production release code. They're all high-level nodes, all Rockets (which may not be relevant).
Any ideas?
Orv W6BI
That appears to be a bug. I have the same issue here appearing in Current Neighbors.
I think I figured out the problem. The node it was happening on was a PBE-M5-400 with a gigabit Ethernet interface. I ran the commands Joe, AE6XE, has given out and the problem is gone. It must be some issue on boot-up when the eth0 interface is being negotiated that's hosing up the DNS.
Anyway, run the following:
opkg install http://downloads.openwrt.org/releases/packages-18.06/mips_24kc/base/ethtool_4.15-1_mips_24kc.ipk
ethtool eth0
ethtool -s eth0 speed 100 duplex full autoneg off
Andre, Rockets don't have Gigabit interfaces, so I don't think that was the issue. But the nodes in question seem to have node names exactly 32 characters. That's exactly half of the maximum length for the node name - 64 characters. How long are your problematic node names?
Orv and Andre,
While navigating through our network over RF only paths, through nodes running several older and new code versions, we ran into this DNS Issue where we couldn't get to an immediate neighbor who node name was listed on a mesh status screen. To solve the node navigation issue, we used the older code nodes to look up the IP addresses in theIr OLSR module screens. We then were able to get to the desired node.
All our node names are short, and this issue is being seen on nodes with older code, so I wouldn't jump on the the current 3.18.9 code for the source of this defect.
While the dropping of the OLSR GUI on port 1978 was helpful in what it freed in the way of resources, maybe we should consider a second type of mesh status screen with just IP addresses or one that combines them both?
We also might want to make the periodicity of OLSR broadcast adjustable along with the number of hops. A poor man's BGP alternative could be set up between cooperative sets of nodes through administration and reduce this traffic among stable partners.
Thoughts?
73,
Gordon Beattie, W2TTT
201.314.6964
I suspect this is not a problem on the node displaying the IP address, rather on that neighbor node, not sending a hostname to know about. This may be complicated if olsr continues to cache the host/IP of nodes, e.g. when renaming a node, the old name still hangs around for a while on nodes across the mesh.
You should be able to reach the node by using the IP address directly. If you have this scenario, then install the "tcpdump-mini" package and capture data with this command (assumes over RF, but if path is over a cat5, change wlan0-1 to eth0.2)
tcpdump -i wlan0-1 -c 1000 -w /tmp/my-node-name.pcap port 698
This may take several minutes to collect a 1000 packets. Then send me this data file along with the support download.
Now, reboot this node and or "/etc/init.d/olsrd restart" and "/etc/init.d/dnsmasq restart". Did that make the symptoms go away?
Joe AE6XE
Orv W6BI
root@WD6EBY-LA-MtWilson-SE:~# cd /etc/init.d
root@WD6EBY-LA-MtWilson-SE:/etc/init.d# ./olsrd restart
packet_write_wait: Connection to 10.176.140.239 port 2222: Broken pipe
[obeach@jethro_house temp]$ ssh -p 2222 root@10.176.140.239
ssh: connect to host 10.176.140.239 port 2222: Connection refused
<reconnected>
root@WD6EBY-LA-MtWilson-SE:~# cd /etc/init.d
root@WD6EBY-LA-MtWilson-SE:/etc/init.d# ./dnsmasq restart
/etc/rc.common: line 1: can't create /tmp/hosts/dhcp: nonexistent directory
/etc/rc.common: line 1: can't create /tmp/hosts/dhcp: nonexistent directory
udhcpc: started, v1.28.3
udhcpc: sending discover
udhcpc: no lease, failing
root@WD6EBY-LA-MtWilson-SE:/etc/init.d#
Orv W6BI
We'll have to dig into OLSR to find out why it is not performing its function, and why this is infrequent or not easily repeatable. If anyone finds repeatable steps and/or particular configuration to reproduce, please post. Half the battle sometimes is reproducing an issue.
Joe AE6XE