Everyone in SoCal is seeing a significant degradation in link performance, many links that have been great quality for months and years are suddenly unusable. Symptoms seem to be across all the channels in all the bands. Major tower site where 6+ P2P links coming in are usually near 100%/100%, and last couple of days are showing ~30%/60% LQ/NLQ. This is going on from San Diego up through Ventura maybe beyond.
Anyone elsewhere seeing similar?
Joe AE6XE
Anyone elsewhere seeing similar?
Joe AE6XE
Keep in mind those maps are only predictions, but there does seem to be a correlation.
If you look now, you'll notice an orange "blob" just off the So. Cal. coast, that sucker has been hanging around for a couple of days and is finally moving away.
This was that same map a couple of days ago: Those dashed lines indicate an "Unstable" area due mostly to local Thunderstorms.
http://www.dxinfocentre.com/propagation/hti.htm
It got worse after that, the pink area in the center got larger but was still off the coast, and then today the red area was right on top of us up here in the Ventura area (it may not have stretched all the way to the OC).
Before today (and for the last few days) it has mostly been an evening time phenomena and would go away a couple of hours after sunset or until about midnite-1am, it was consistent.
Today, the link degradation lasted all day and it did not let up. It has only just started to get better as of now.
I am certainly no expert on this kinda stuff and this should be taken with a grain of salt (or 3), but I think there may be a pattern here.
Keith, I thought SMP was also experiencing this on the link to Elsinore. But it's not. It's definitely a radio issues. Unfortunately, I also can't talk to the switch, so I can't even power fail the radio. I'll need to get back up there.
The San Diego network as a whole has been rock-solid throughout this. Interesting we weren't affected down here.
I am seeing a lot of meshchat traffic and other traffic. It would be a very good idea to turn off meshchat instances, particularly if there are still other instances that can be used. Map crawlers, can be turned off. This is just a process of elimination.
Check the link rates, if these are still good or high, then this isn't an RF related issue.
Joe AE6XE
the lonk rates i am I am seeing are still decent, slightly worse than usual but still reasonable given the LQ/NLQ.
What we're not seeing (please post if you are seeing otherwise):
A) enough OLSR traffic that it is a root cause or significant contributor to the symptoms. I'm only seeing from ~8 to ~18 OLSR packets being sent out from a given node in a second. 18 x 1500 bytes/packet is tiny in comparison to our Mbps links. the traffic, while it is floating up, isn't enough volume to explain this issue, and is expected traffic when links are going up and down.
B) flooding the network with traffic everywhere. Just not seeing data flooding the network
C) link rates are still relative good/high. If we were having atmospheric ducting issues, bringing in more noise, etc. then the link rates would be coorespondingly dropping with the additional interference and/or noise.
Joe AE6XE
Wondering if there are non-standard or modified nodes running that could be contributing? ie. Pi's running olsr, hamwan linking experiments, PC's running olsr, etc...
(not pointing fingers, but, just thought this may be a good data point)
I skimmed the network mapper database, and found no 'non-standard' nodes that might be running funny OLSR daemons. We had seen one or two in the past, but that's not the case at the moment.
When the "problem" was happening, all of my links including the tunnels went to crap. Someone who knows more will have to figure out why the DtD and 4 foot RF links that have no traffic were affected.
And as I type this on Monday morning, all is healthy...
I see we have another tropo hot spot that's developed right off the coast. http://www.dxinfocentre.com/tropo_wam.html
Anyone else seeing sudden link degradations this afternoon?
I'll email the list out to the SoCal Hamnet mailing list.
While this is a degradation we may be able to attribute/confirm to environmental conditions, it's not fully explaining the LQ% we were seeing. This is UDP broadcast packets that are going missing. What we need next time is the link rate table to see why a given MCS rate is selected and the packet success rate over the link. If the link itself is showing only 5% loss for the chosen MCS rate, but OSLR udp packets are showing 80% loss, then there's still more to explain.
The magic of 802.11n is that it is supposed to keep rolling with the punches of the environment: inversion layers, tropo ducting, fading, etc. This means the modulation and code schemes are changed to maximize the data throughput possible given the current conditions. Thus, the link rates go down as conditions worsen. But the quality of the link (LQ%) should be maintained (or minimally affected) until going down to the lowest setting of MCS0, then can't function.
Joe AE6XE
Compare the actual received OLSR packets with what OLSR says is received. This can be done on the node, after installing the tcpdump package:
tcpdump -w /tmp/<hostname>.pcap -c 500 port 698
tcpdump -w /tmp/<hostname>.pcap -c 500 -i eth0.2 port 698 <- coming in over dtdlink
ifconfig will show interfaces other than eth0.2 (dtdlink) to substitute for tunnel, etc. -c 500 says capture 500 packets. Copy the pcap data file in /tmp down to your computer and open in wireshark. Each packing from a given neighbor will have a sequence # in the OLSR protocol. Out of 10, what % was missing. Does this compare to the LQ OLSR shows for that neighbor? (Did we receive on the interface, and did OLSR also receive or was blocked/busy and lost it.)
Joe AE6XE
Yes, once I have the file, I know how to upload a package to an AREDN node (I do it every time I do firmware updates)...
https://arednmesh.readthedocs.io/en/latest/arednGettingStarted/advanced_...
Scroll to "Package Management".
On a node with internet access; select:
Setup -> Administration -> Download Package -> Select Package (from drop-down menu)
tcpdump-mini 4.9.2-1
Hope this helps, Chuck
I would not think it's my map, the issue with the older FW and my mapping scripts was fixed a long time ago.
I just stopped polling the nodes the way I was, now it's all http when going over the mesh.
The amount of data coming back from a remote node to one of my mappers is small, how big is the sysinfo.json output? That's all that is polled for, and even that is only once per hour.
Anyways... Over the weekend I was talking to a friend of mine up in Santa Barbara and he does a lot of work with some pretty long range 11GHz links, 40-60 miles.
I asked him if over the last 2 weeks or so he was seeing this same thing and his answer was "Yes, Like you would not believe".
So I still don't know, I agree with Joes post #7 in this thread, that if it happens again we should start turnning off some of the "distributed" services and see if it makes a difference... I kind of think it won't, but it's worth a shot. :)
Also if we're going to think about the "odd devices" running olsr, what about something/someone trying to come "in" from a mesh gateway causing it too? who knows at this point, there are several "ways in" too.
If we see it again around here, I'll try to capture some packets and see whats in them.
In Ventura here, we (and I) have changed nothing and it's all gone back to normal.
*edit* sorry to repeat what Orv already said, and your right Joe, I completely forgot to look at the that link mod rate table to see what it was doing during all this, very good point!