Hi all,
after a full year of operation of a mesh including 14 nodes of several types we are trying to summarize the troubles evidenced in this operation period.
The only recurring problem is the following:
without any evident correlation we got a reset of various radios; namely the affected nodes was:
- 2 nodes equipped with bullet M5; one of the nodes got reset 4 times, the other 3 times
- 1 Nanostation M5 : got reset 3 times
In all cases to restore the normal operation it was necessary to access the node locally via the ethernet port with the default 192.168.1.1:8080 URL and setup again the node.
I tried to search our forum for similar occurrencies but found nothing on the subject.
Does anybody had evidence of similar troubles ?
The only possible cause we see for this trouble is the remote reset mechanism via the POE injector; unfortunatelly on the AREDN FW we do not see any way to disable this mechanism.
Any help on the subject is welcome.
Best regards and Happy New Year to everybody !!!!
Mike
We have experienced this from time to time at our sites. With over 70+ nodes in operation, we have had only a few resets. Most of these seem to be related to intense electrical surges from storms. Some seem to be a result of water migration in the ethernet cables reaching the connector at the POE injector. We did recently experience a total reset of 3 nodes at the same site for no apparent reason after a firmware upgrade to dev171 on one node. All of our nodes have been set up with the remote reset option set in the base firmware. I'm not sure if this setting can be turned off to follow through when AREDN firmware is installed. I prefer to be able to reset the node remotely rather than having to take a trip up a tower anyway.
Mostly, the nodes end back up in an unprogrammed state with "NOCALL". Other than that, nothing really repeatable or reportable.
There is a handler definition that could be removed. It essentially clears the flash overlay (which is anything saved after first boot) after the reset button is pushed from min 12 to max 20 seconds. But would you want to incur the risk of climbing the tower if the node becomes unresponsive verses a visit to the radio room to re-type in the settings if it is reset for unexplained reasons? I'd not want to turn this handler off and risk something more painful. Although, should still be able to put it in tftp mode which occurs before the AREDN firmware is loaded.
If moisture is the culprit, this will continue to cause problems regardless. Is there any corelation of hardware models or anything else that have unexplained resets to firstboot? If we can zero in on root cause, maybe there is a better remedy.
Joe AE6XE
Both of you just confirm my analysis....
I will try to check the cabling of the affected nodes and eventually disable the reset if the radio location allows an easy access to the antenna just in case.
Thanks for your help
Mike
During my relatively trivial base of experience, the only similar events I've seen have not been Ubiquiti's resetting without their config but where there were inadequate (as in non existent) exterior drip loops beneath the ice bridge and a pair of ESD/Lightning surge suppressors mounted to and grounded to the former TV Broadcast Station's Ground System inside the Structure rusted internally, shorting to ground, taking a Rocket and a NanoStation off the air. The Ubiquiti Toughswitch and the radios were fine once we coerced a bit of af an exterior drip loop on the two cables and replaced the two ESD Suppressors. Fortunately (accidentally) there were minimal drip loops inside the structure before that happened. There were very intentional interior drip loops afterwards<g>.
I know of an installation where the 'Certified' tower climber came down after a couple of stressful hours in 100 F plus heat up a 100' Tower with a Rocket's weather cover in his pouch. That Rocket never failed from rain because the cable drops at a steep angle to the first cable tie with no 'drip potential' into the radio. It is about 93' AGL and it worked flawlessly through the 2016 - 2017 "winter of atmospheric rivers", surviving more rainfall than has ever been recorded in the 100+ year history of WX records here in Mariposa County, CA.
That Rocket never missed a beat until the July 2017 Detwiler Fire took out PG&E power for 9 days and our household generator and UPS's eventually pooped out before we dared to haul in fuel. After power was restored it came back up with it's proper config. The QTH radios all surviving was 'gravy', we were happy enough to find the house, barn & outbuildings still standing and the barnyard animals still alive after the fires.
Our hearts go out to everyone impacted by this years' Fires elsewhere in CA. It is a new world of evolutionary awareness.
This was the only thread I could find on this, but I just had a node reset itself for no reason last Friday. Unfortunately, it is the anchor node that not only hosts the tunnel connections, but also serves most of the services on our network. So, a rather unfortunate happenstance. Fortunately I have all of the tunnel connections and configurations saved, so I can get it all back up and running no problem, but I would prefer to never have this happen again. If this had happened on our Park Tower node, it would be down until at least March. It sounds like it is a hardware problem, not software, but for whatever its worth, this node was / is running version 667-7163819, is an Rocket M2, is fully indoors, and is connected to their smaller 2G sector antenna. It could have been some HF activities that I was doing with my 500w amplifier, but nothing out of the ordinatry and nothing I hadn't done probably a hundred times before.
I'm not sure whether this may be related to what you're experiencing: https://www.arednmesh.org/comment/16686#comment-16686