Work has progressed on an auto distance setting, thanks to recent upstream contributions. I'm looking for comments and wider testing before committing in to the nightly build, and still a change to hook into the UI to occur. This can be manually turned on and tested now.
https://github.com/aredn/aredn_ar71xx/issues/210
Joe AE6XE
https://github.com/aredn/aredn_ar71xx/issues/210
Joe AE6XE
You amaze me more every week.
I really think this will help reduce turn-around times and latency over fixed 'distance-set' networks.
My only question is how dynamically does it check it's connections and auto-set the distance? Is it once per minute, per hour?
I'm asking because it may effect a network with mobile nodes, which I admit is a niche case.
I'm also curious if it uses some iperf tool to do this. If so, then my dreams of having a tool like iperfSpeed built into the base AREDN firmware may be closer to realization.
I proposed that we have a tool like iperfSpeed built in to the base FW because it is the best way to test the throughput of links and connections, not SNR or RSSI. And anybody who wants to manage/test/regulate the network would have to negotiate with each node owner to install iperfSpeed on their nodes. It would be like herding cats to try to help make the network run better. If all nodes had an iperf tool on them, we could use a net-monitor tool to automate outages or degradations in the network, etc. Am I smoking crack?
Once again, thank you, Joe.
- Damon K9CQB
Orv W6BI
If I understand how distance to furthest node affects turnaround and latency...
If DTFN is akin to FrameRetry in AX.25 then, if the other station hears 100% of the transmitting stations packets,
then there will never be a need to retransmit a frame and DTFN can only be too short. Ergo, increasing DTFN has
no effect on latency. Too short a DTFN will create unnecessary retransmissions.
If the receiving station received the packet, but the channel is busy and it does not respond in DTFN time,
there will be an unnecessary retransmission. Ergo, increasing DTFN time will allow the receiving station extra time to
return an ACK. This may help when sending data to an 'exposed node'. Retransmitting unnecessary frames contributes to 'latency'.
IMHO, the number of 'hidden nodes' in a circuit harms turnaround and latency exponentially.
In as point-to-point link I can see a benefit in auto-DTFN.
IMHO, it would take a wizard to write a program for auto-DTFN in a circuit with
'exposed nodes' and/or 'hidden nodes'.
So, yes, it is possible.
YMMV,
Chuck
ack timeout too short: The node will think the data went missing and retransmit the frame, and risk to collide with the ack imminently being received. This causes back-off timing to re-transmit and really bad performance.
ack timeout too long: If a data frame goes missing, then we are waiting extra time to decide it is missing and retransmit. Better to be too long than too short.
The ack timeout is only related to hidden nodes or exposed nodes because these conditions cause higher collisions to occur, and thus would trigger more ack responses to get lost. 802.11 uses RTS/CTS to coordinate with hidden nodes, so this is not a rampant problem.
The issue with putting iperf in the default image is that we're pushing the limit on available space with 8MB flash devices. It's to the point that there's concern of no remaining room for tunnel packages. Since everyone doesn't use iperf, it's a difficult decision to include.
I've heard of using SNMP to monitor CPU load on servers and network management, but I never thought of using SNMP for tracking the bandwidth of all of a mesh network's links for outages or degradation. I would love for someone more familiar with SNMP to show how we could use it to manage our AREDN networks.
Right now we were trying to find a way to run a net-monitor tool that does an iperf test across each node every 10 minutes, or if that's to network intensive, every hour - something like that. Despite being a bandwidth hog, iperfSpeed has been our most effective tool so far for improving and fixing our AREDN mesh network, link by link.
Perhaps we should start a new thread and possibly a working group to find the best network management tool for AREDN networks.
I'm going to get smarter on SNMP.
- Damon K9CQB
Orv W6BI
For the benefit of those not familiar with this, SNMP is based on a centralized SNMP Manager and software agents residing on the nodes you want to manage. The agents contain a Management Information Base (MIB) that’s been coded to take snapshots of a nodes operating parameters, including interface byte-counts, for example.
As a network manager, you may be interested in knowing the data throughput of, say the eth0 interface, or the RF interface. You would configure the SNMP Manager to query the node for the byte count on that interface. The node would respond with that count and a timestamp for when that value was captured. A second similar query, sometime later (seconds, minutes, or whenever) would again return the count and a timestamp. The difference in the byte counts divided by the difference in the timestamps equals the average data throughput of that interface in bytes per second.
The SNMP agent is an optional package installation on AREDN.
The last I looked, the standard SNMP taxonomy doesn’t support all the parameters we may be interested in collecting, so someone will likely need to define what would best serve our needs and then modify the MIB to capture and return the corresponding data when queried. There was an effort started a couple years ago to do this. I don't believe it got very far, but if anyones interested in tackling this they are welcome to.
There are many SNMP Managers available and some are opensource or free to use in small networks… usually based on the number of queries you are routinely running. MikroTik offers a free one, in fact, called “The Dude” [ https://mikrotik.com/thedude ]. Another which I’ve used is PRTG [ https://www.paessler.com/prtg ].
Andre
Tools that work together. Both primarily serve different purposes.
iperf: a traffic 'generator' used to optimize wireless settings with the goal to ensure maximum capability of data thoughput, best used RF link by link.
snmp: a traffic 'monitor' of the run-time loading of a network to know node health and find bottlenecks
Joe AE6XE
Several times a second - that's excellent. I'm pretty excited about that.
With fixed 'distance settings' we have run into problems when installing a node [Node A] that points at an existing directional node [Node B], but that node [Node B] is already shooting to a closer node [Node C], therefore Node B's 'distance setting' is too small and we're not making the link. So we have to find the owner of that node and call him, hopefully he has access to his node, so he can change his 'distance setting' to include our more distant node. Hopefully that only takes 45 minutes to an hour as we're hanging out on a tower or rooftop. Then we can get back to work at refining the panel/dish pointing and finalize the install.
As far as putting iperfSpeed and other tools that require memory space, I agree that the devices with limited memory wouldn't handle that 'enhanced firmware'. Do you guys feel like we're going to have to make a decision in the near future that certain nodes with limited memory will continue to be developed, but will not be able to receive 'enhanced features' or that won't be able to do load iperfSpeed or Meshchat because there's no memory left? Or are we just going to not put enhanced features in the software so we don't have to manage separate firmware loads for Bullets and Air Grids for example?
- Damon K9CQB
The testing so far has found great results, but with links upwards of 20 to 30 km ranges. I just tested this on a link with 60km real distance. Be advised, the auto setting does not work at this distance -- something in the calculation leaves it stuck at the 20km setting -- too big a gap and probably going out of bounds somewhere. Until I get a fix though, best not to use this on the really long links for now.
Orv, Eric, can you comment on the distances you tested with, to confirm the 'good' distances to use for now?
Joe AE6XE
Thanks for the heads up Joe - I was just getting ready to test on our 55km link. Will standby for a fix. I was wondering how that was going to work. Also, we just implemented a "mobile" node with an hAP paired with a LDF Mounted on a DISH Network dish. This feature will become very handy for these applications where distance measuring isn't readily available.
Joe, my testing was only out to 8 km. Eric's was quite a bit farther, at 26.2 km. His is just about the longest link we have, so we'll deploy, cautiously.
Orv W6BI
Joe, I've done a bit of testing and find some inconsistencies that may need investigation.
I tested two nodes on my tower, an NSM2 and a Powerbeam M5 300. Both talk to nodes on the same tower, 7.68km (7,680 meters) away.
Using the command cat /sys/kernel/debug/ieee80211/phy0/ath9k/ack_to, I get these results:
NSM2: 196 to 344, then back to 197 A
Powerbeam M5 300: between 79 and 96 A
Setting the NSM2 to 8000 meters manually yielded 115 S
Setting the PBM to 8000 meters manually also yielded 115 S
(BTW, the formula you gave me is actually to derive the Static setting from the distance in meters
(8000 / 151.515151 + 64) yields 116, which is close)
So if the formula is correct, and I'm applying it correctly, then the auto setting for both of them ought to read about 115.
It appears neither of the nodes is calculating the distance accurately, or even approximately.
Any idea what's going on? (note that under auto distance, both appear to have normal throughput, although I've done no serious testing).
Orv W6BI
according to ath9k max configurable value in AR_TIME_OUT for acktimeout
is 0x3fff. The max ack_to we can configure (assuming clockrate set to
ATH9K_CLOCK_RATE_2GHZ_OFDM) is ~372us (~55km).
We can try to set MAX_DELAY to 360 (max distance ~54km). If you confirm it
works properly I can post a patch (or you can take care of it, up to you)
I suspect the specs are based on what the manufacture has tested with or some older chip set ath9k supported -- we obviously can see it has worked great with 60km+ distance settings and isn't physically limited in the chip. We'll likely need settings over 1000us.
The values we are seeing are also floating above the actual physical distance, but remember, the time is round trip and includes the environment plus the processing time on the other end for the device to respond. with all these conditions, at these distances, the algorithm might be further optimized. Step 1, create a fix so that at 60km+ it is working. Step 2, we'll now have actual data to review and begin a dialog to see how this can be further optimized. We don't yet know if the value floating higher than physical distance is good or bad.
For those not wanting to participate in this activity, just don't set the distance to '0', and all as is before.
Joe AE6XE
I originally saw about a 1.1Mb/s increase in my traffic up to the hill (upload), and a 4Mb/s increase in the other direction (download).
I just finished loading up the most recent nightly build and the numbers I am getting are still basically the same. The link I have is currently just myself and it, there are no other RF neighbors on the hill top. I do notice the ack_to number increasing by about 10 when I do an iperf test, but only when *I* am the client (uploading).
The hill top node's auto distance number does not change during this same period tho, only mine. About all I can think of is: multipathing.
In my case it would be "reflections" off the nearby building roofs and other things, I don't know, just thinking out loud... :)
ssh into the node and run
cat /sys/kernel/debug/ieee80211/phy0/ath9k/ack_to
You'll get a number then a letter; A is Automatic, S is Static.
This will help in urban environments like we have here in Northern Virginia, where we can close a link, but we need to make sure we're not bouncing it off of a glass and steel building (which happens more often than you'd think).
-Damon K9CQB
Orv W6BI
Does not ask for Reboot when changing to Auto from other distance. Just hit Apply.
Thanks!
Orv W6BI
After discussion with the original author of the auto distance settings in the nightly builds. There is now a path forward to address the longer distance links. The current implementation is limited to nodes that are no more than somewhere around ~20km apart. Please keep your testing and usage of auto distance within this range. Very good results have been reported so far in this range.
The current implementation's breakdown is when the ack timeout is too short (not a usable link), then it can't measure actual round trip response time, it's already moved to a mode of retransmitting the data. In this condition the logic turns to the packet handshaking from a process used for encryption, called wpa_supplicant (or wpad). AREDN doesn't run this process since we're not doing encryption. Consequently, there isn't a trigger to get out of the too-short-ack-timeout condition. This appears to be straight forward to address.
Joe AE6XE