You are here

Plea for new stable release and/or release candidate

11 posts / 0 new
Last post
AA7AU
AA7AU's picture
Plea for new stable release and/or release candidate

I know that the AREDN team is constantly working very hard to improve the firmware as well as to add more hardware options for all of us.

With that said, with the likelihood that some of our networks are going to be potentially put into [real] service soon, I would plead for a rededication of focus into getting out a new stable release or release candidate. Suspend new or beta features and please get out ASAP a replacement for 3.19.3.0 (now a year old) so that everyone can benefit from all the new improvements without the Tower of Babel problems related to using all the various Nightly Builds.

Personally, I would like to upgrade all of the nodes on our little mesh island up here in Idaho, but I won't do it with Nightly Builds.

Please?  And, again, thank you to everyone on the team for all the selfless efforts in support of better digital EmComm.

- Don - AA7AU
 

k1ky
k1ky's picture
Plea for new stable release and/or release candidate
I second that emotion. Due to the new Openwrt "upgrade" we are experiencing problems upgrading many of the 32MB devices.
It would be nice to have a "production" version of Nightly 1234 so we can still install tunnel on those units.
 
K5DLQ
K5DLQ's picture
I suspect we will have
I suspect we will have something faster than you can say "COVID-19 vaccine".....   ;-
AE6XE
AE6XE's picture
Have you tested the current
Have you tested the current nightly build?     This will accelerate the timeline to get out a release and prove to the community the nightly build is worthy of being released.  We think it is ready to be released, but have we/you proved it?

Joe AE6XE
AA7AU
AA7AU's picture
GL-AR750 tested OK

I just did a brand new NB# 1394 install with tunnel (first) and meshchat on a fresh-out-of-the-box GLiNet AR750 and all seems to be working well with RF, LAN, tunnels, WiFi client, etc. Only note that the WiFi WAN client only seems to work over the 2.4 radio and not the 5.8 - which I suppose is that continuing issue with not having the drivers for it. Was not immediately able to test DTD but wouldn't expect problems there.

Thanks!
- Don - AA7AU

PS: I have a Bullet M2 with 3.19.3.0 that I'll try to upgrade later today or tomorrow.

AE6XE
AE6XE's picture
Confirm the 5Ghz is not
Confirm the 5Ghz is not already selected for the LAN AP?   It would not be an option for WAN wifi client if already in use.

Joe AE6XE
AA7AU
AA7AU's picture
5Ghz WAN

No. I de-selected the Access Point setting, saved, and then selected the WAN Wifi-clinet toggle, saved and rebooted - similar to what I did to test the 2Ghz radio option for WAN WiFi client. Got no error messages or anything, just would not connect into my Samsung Galaxy S5 cellphone hotspot over 5Ghz (but did over 2Ghz).

edited to add: same results on 5Ghz as with another AR750 running 1234.

- Don - AA7AU

ps: my tunnel tests were from the AR750 running 1394 to a couple different tunnel severs running 3.19.3.0 - but did not test inbound tunnel connect.

AE6XE
AE6XE's picture
Don,  numerous people have
Don,  numerous people have used the wan wifi client on 5GHz, so there must be something new or unique going on.  Please boot up with these settings, and let it run for 5 or 10 minutes, then capture a support data file to upload here.   I don't see this has yet risen to an issue that would stop a release -- would need multiple people to reproduce. 

Joe AE6XE
AA7AU
AA7AU's picture
Serous Pilot Error - pull up, pull up, beep beep beep

I apologize profusely. I did NOT do my due-diligence on this. It turns out that my Samsung Galaxy S5_will_ operate as a hotspot under my VZ plan, HOWEVER there is a deeply buried "advanced" setting for selecting which one of the "broadcast channels" to use and mine was set to 2.4. I just discovered this while upgrading my Mikrotik Haplite to NB 1394.

Hopefully that explains the strange error message/condition discovered in the bowels of the AR750 previously. It's next up for the bench.

When Joe says "gee, it's worked for everyone else" - pay attention and don't auger in like me. I feel really badly about wasting his time.

Truly sorry about that,
- Don - AA7AU

kj6dzb
kj6dzb's picture
Joe

Joe

It would be nice to have a new release for device compatibility!!!! The 32mb nodes do run the nighty! For me... Do you think the older hardware will Handel the new OS, from the info received so far? I have a growing number of devices that require the current device support. (I have another New LHG coming in a week) and a  LHG HP model that needs to get checked out...

Q: I would like to suggest that If space you add a way to enable the OLRD status page back?  

Q: Can the project start thinking about a Hard migration to OLSRD2.  A new nighty fork? 

Q: Might running multiple instances of OLSRD for each interface, each with a different timing metric config, help limit the overload? Or Is it time to migrate to OLSRD2??? Developing a config patch after the 2020 release could push work!!! release OLSRD config later to mitigate the (Issue addressed below) ?  

I would like to address the OLSRD Flooding events, recently in SoCali and SF Mesh network. Possible cause is still being studied. A few Ideas: Latency lag between tunnel servers connecting over Comcast Network. Network outages events / Network Saturation in Disaster events. TC messages floods (across tunnels --->Rf Hubs nodes) cause nodes to crash. RF topography adjacent to the node is slightly affected nodes don't crash. Networks with topography that relies on tunnel node links that do pass thru an ISP, or Cellular system experience OLSRD over loads on its systems memory. Systems may stall and or crashes on the node running a tunnel. This doesn't affect the Raw RF <---> RF nodes.

73 Mathison Kj6dzb

AE6XE
AE6XE's picture
Long winded here, but I

Long winded here, but I suspect many will appreciate the details of OLSR issues and current state.

"Do you think the older hardware will Handel the new OS, from the info received so far?"    All the 32Mb (the older hardware) RAM devices are running the firmware. It's only in a sysupgrade situation where a spike in RAM is needed, where we are seeing symptoms so far.

"​Q: I would like to suggest that If space you add a way to enable the OLRD status page back?"
the same information is available through command line, and also included in a support download to inspect offline.   The cost to maintain this page wasn't/isn't deemend justified.

 "Q: Can the project start thinking about a Hard migration to OLSRD2.  A new nighty fork? "
 Is in consideration after this release and competes for time to replace the Perl UI with something more efficient (the biggest factor consuming RAM).  We'd likely go with BATMAN instead -- both paths are not compatible with OLSRv1 and will present a challenge for everyone to plan out and upgrade their mesh islands all at once.   It is not yet known if the current OLSR storms have a root cause in the latest version of OLSR in AREDN, or is a factor of custom user olsr configurations across the mesh islands.  Regardless, upgrading away from OLSRv1 is desired and a good thing by everyone.

"​Q: Might running multiple instances of OLSRD for each interface, each with a different timing metric config, help limit the overload?"   Unlikely from what I know.  Ditto from above, it is not yet known if OLSR version in AREDN is breaking down due to scaling and/or the mesh complexity or antoher cause.  This needs flushed out to best determine a path forward.   

"​Systems may stall and or crashes on the node running a tunnel. This doesn't affect the Raw RF <---> RF nodes."    The tunnel itself and the performance over the tunnel is unlikely to be a factor.  Layer 3 OLSR doesn't know what layer 2 channels the bits are transmitted over.  Both RF and internet paths can have unusable or margin links with data loss and lengthy delays.   This is the normal operating environment for OLSR.  

One suspect area is to ensure all the OLSR instances on the mesh are current and patched to the most recent implementation used in AREDN.   Mark N2MH and I have been discussing the custom tunnel and PBX deployment connecting many mesh islands around the globe.    This is currently a very old version of OLSR missing numerous bug fixes.     Consequently, Mark is looking to bring it current -- and we will then see if this makes a difference, or we need to keep looking for other possible factors.

Put into perspective that of the defects found and fixed in OLSR in the past couple of years,  these were not scalability issues per se -- symptoms occurred on 10 node mesh just the same as the symptoms would show on a 1000 node mesh.     No one knows how high OLSR will scale up to with our current hardware.    But certainly old WRTG54 linksys devices running old versions of OLSR with ~20 year old hardware simply don't have enough horse power to scale.   

What we have right now is a perceived ceiling.  OLSR isn't working harder to communicate information to go from 500 nodes to a 1000 mesh network, or falling behind per se.  OLSR will have a ceiling on how many packets it will send in a second based on timed triggers looking to send data.  This may not be overloading any of our nodes, rather it just means there may be delays to know about a route or device that can be communicated with -- get an update about a node on the mesh every 5 min, rather than every 1 min.

Where we most likely have a performance bottleneck is that DNS is reset everytime a node receives new hostname information.   Resetting DNS 10 times a second is horribly inefficient.  We only need to do that once every 10 or 15 seconds at most.

Joe AE6XE

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer