OLSRD restarts
We have about 25 nodes connected, several with many services. Not every connection is 100% many are not line of site because of the terrain. We are running V3.16.1.0b.02. The nodes have been running for 78 days without reboots but the OLSRD keeps restarting. In the 78 days many of the nodes have had 1000 to 1200 restarts and the main connection node from North to South has had OLSRD restarts 2100 times. Several times when we accessed a node it becomes unavailable due to being in the OLSRD restart mode" We are trying to understand what is happening, suggestions have been that the many services hanging on the node are causing problems, or with intermittent connections the OLSRD wait time is exceeded and the OLSRD goes into restart mode. We have isolated three main node connections - changed the SSID so they don't connect to anything else and so far there have been no OLSRD restarts in a day. The LQs are 63 and NQ 100 (not all 100 -100) before the change of SSID the LQ and NLQ were 40 - 80%
I have a dump of the worst node before we changed the SSID if that will help.
WB6WGM
We have about 25 nodes connected, several with many services. Not every connection is 100% many are not line of site because of the terrain. We are running V3.16.1.0b.02. The nodes have been running for 78 days without reboots but the OLSRD keeps restarting. In the 78 days many of the nodes have had 1000 to 1200 restarts and the main connection node from North to South has had OLSRD restarts 2100 times. Several times when we accessed a node it becomes unavailable due to being in the OLSRD restart mode" We are trying to understand what is happening, suggestions have been that the many services hanging on the node are causing problems, or with intermittent connections the OLSRD wait time is exceeded and the OLSRD goes into restart mode. We have isolated three main node connections - changed the SSID so they don't connect to anything else and so far there have been no OLSRD restarts in a day. The LQs are 63 and NQ 100 (not all 100 -100) before the change of SSID the LQ and NLQ were 40 - 80%
I have a dump of the worst node before we changed the SSID if that will help.
WB6WGM
Joe AE6XE
another node that was attached does have jabber on it. That node is now off the mesh.
A ticket against the closed beta will likely be closed without action, or at least a request to duplicate in stable since betas by their nature are not recommended for production and are expected to have possible flaws.
If 3.16.1.0 were still in beta that would be one thing, but since that beta has closed out and a stable release exists there is little point in filing a ticket against it.
Regards, Robert, WB6WGM
Joe AE6XE
I recently received the download of the node (KK6ISP)- that when added to the MESH started KF6LCS to have OLSRD restarts. The number has been drastically reduced since we upgraded to the released version. See attached Robert, WB6WGM
Robert, I'm not seeing anything to help find why olsrd restarted. How many restarts are you still seeing?
What is happening is the olsr program overwites/creates a temp file at 5 sec intervals, a heart beat. Another program wakes up every 15 seconds, and if this file hasn't been overwritten, will restart the olsr program. So for some unknown reason, the olsr program is ether crashed or not responding for 15 seconds.
I'm not seeing any evidence of a crash, just seeing olsr starting up a couple times. Possible causes include:
1) RAM consumed and not available
2) some package was installed or some configuration change took place that caused the network interfaces to go down/up. (manually pulled dtdlink cable?)
3) too low voltage--general weirdness starts happening
What's different about this mesh node, anything?
Joe AE6XE