i just deployed a Rocket M5 XW with an Omni antenna. I have several of these deployed at this time with no difficulties to date but what makes this one different is it is dtd with a Powerbeam M5 400 and a Nanobridge M2. All have been running the latest release candidates and the longest cable run to any single node is about 60’.
Since the XW went live on Friday I have been experiencing dropouts of this node and an attached node. By dropout, I mean the ability to navigate to the individual node GUI. There is no evidence of the node resetting or losing power nor is there evidence of dropouts with connectivity to the ToughSwitch. Originally it appeared it might be an IP conflict however this was worked through and and appears to be resolved. I experienced the same “drop” while in an SSH session to the node. I am actually not convinced that the problem is with the XW but might be in the PowerBeam as that is still the newer technology and is using the gigabit connection to the ToughSwitch. Andre, Joe, and Conrad, this feels like it may be similar to the experience we had while on the Elsinore tower working with the PowerBeam to Sleeping Indian. I think that was a cable length and solved by locking the node to 100 Mbps. This cable run is significantly shorter than what we dealt with there. Perhaps I am on the wrong path.
I did load 171 to the XW this morning with no change and was thinking of trying it on the PowerBeam.
Ideas?
Keith - AI6BX
Since the XW went live on Friday I have been experiencing dropouts of this node and an attached node. By dropout, I mean the ability to navigate to the individual node GUI. There is no evidence of the node resetting or losing power nor is there evidence of dropouts with connectivity to the ToughSwitch. Originally it appeared it might be an IP conflict however this was worked through and and appears to be resolved. I experienced the same “drop” while in an SSH session to the node. I am actually not convinced that the problem is with the XW but might be in the PowerBeam as that is still the newer technology and is using the gigabit connection to the ToughSwitch. Andre, Joe, and Conrad, this feels like it may be similar to the experience we had while on the Elsinore tower working with the PowerBeam to Sleeping Indian. I think that was a cable length and solved by locking the node to 100 Mbps. This cable run is significantly shorter than what we dealt with there. Perhaps I am on the wrong path.
I did load 171 to the XW this morning with no change and was thinking of trying it on the PowerBeam.
Ideas?
Keith - AI6BX
Sounds like it's the symptoms that I have seen a time or two that is already being looked into by the dev team.
Best bet for now might be to turn off the new device creating a new libk until it get sorted as it sounds like that's your trigger point but other than that it's probably not the hardware.
Keith
inet addr:10.132.83.57 Bcast:10.132.83.63 Mask:255.255.255.248
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:126011 errors:0 dropped:0 overruns:0 frame:0
TX packets:35988 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:107887293 (102.8 MiB) TX bytes:15449412 (14.7 MiB)
Interrupt:4
eth0.1 Link encap:Ethernet HWaddr 44:D9:E7:D1:8A:67
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14967 errors:0 dropped:0 overruns:0 frame:0
TX packets:7487 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4909176 (4.6 MiB) TX bytes:2560554 (2.4 MiB)
eth0.2 Link encap:Ethernet HWaddr 44:D9:E7:D1:8A:67
inet addr:10.209.138.103 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:99806 errors:0 dropped:0 overruns:0 frame:0
TX packets:28500 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:100237923 (95.5 MiB) TX bytes:12744568 (12.1 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:56560 errors:0 dropped:0 overruns:0 frame:0
TX packets:56560 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5668893 (5.4 MiB) TX bytes:5668893 (5.4 MiB)
wlan0 Link encap:Ethernet HWaddr 44:D9:E7:D0:8A:67
inet addr:10.208.138.103 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:73370 errors:0 dropped:0 overruns:0 frame:0
TX packets:73873 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:94195022 (89.8 MiB) TX bytes:98572599 (94.0 MiB)
wlan0-1 Link encap:UNSPEC HWaddr 44-D9-E7-D0-8A-67-00-44-00-00-00-00-00-00-00-00
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:111250 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:104605454 (99.7 MiB) TX bytes:0 (0.0 B)
http://bloodhound.aredn.org/products/AREDN/ticket/234
If you want to give me access to W6LAR's Rocket XW Omni I can log in to remotely monitor and help get to the bottom of it. Check out the TS port statistics, anything different about the ports the PBE M5 and the rocket m5 statistics? I put a PBE-M5-620 on a tower with a ~100' cat5 into the TS. I did not have to do any special configuration for the 1Gbit port on the mesh node. It has always worked fine. But this cable may be shorter than the issues we saw at Elisinore Pk.
For other's benefit. These newer PBE devices come with 1Gbit ports instead of a 100Mbit ethernet port. The Toughswitch was having an incompatibility establishing a stable link with the mesh node over the cat5 and we ended up doing a custom mesh node setting to lock it at 100Mbits. I didn't have to do that at another site, guessing due to shorter cable.
Joe AE6XE
Joe,
I sent you the login info for the Rocket XW as well as the Nanobridge M2 via email. I can also shut down the M2 from the switch to see if that has any positive impact.
No TX or RX errors in the TS at this time. I did also just start system error logging in the switch for drops etc.
Thanks,
Keith
After monitoring and poking around a bit. Here's what Keith and I have discovered.
There's a 'dtdlink' cluster of 3 nodes, call them A, B, and C. We are accessing node C over a DtDlink cat5 cables (ToughSwitch in the middle) from node A. I discovered a lot of unrelated traffic also going between Node A and C. This was "UDP" protocol traffic. As I monitored this link, the OLSR Hello UDP packets were also on the link. These packets were not all arriving on the other side and OLSR was dropping down to 89% LQ and 100% NLQ and OLSR at that point was showing an ETX of 1.23 for this DtDlink.
I'll have to dig though some code, as there's a threshold to drop a cat5 link as I recall at 95% LQ. But the logic seems to revert to the behavior of an RF path olsr settings with best ETX of 1 and link drop at like 20% LQ. Normally this link has a DTDLink of ETX pegged at 0.1.
At this point, 'A' still thinks the ETX or cost to hop though B on to C is 0.1 + 0.1 = 0.2 ETX. So it flips routing through node B thinking it's now lower cost. Then the A -> C link goes back up to the 100% LQ and about ~30 seconds later flips back to normal. I watched this flip-flop ~6 times tracking the olsr changes.
This route flip-flop coincides with the symptoms we see -- connection dropping and delay in responding. We'll have to investigate further. Is there a hardware factor causing packets to be lost? All 3 nodes see the traffic from everyone else, so why is only one cat5 directional path showing lower LQ and not the others?
Here's what I captured in the middle of the event on node 'A' (when the access to node 'C' drops):
Table: Links
Local IP Remote IP Hyst. LQ NLQ Cost
10.63.24.253 10.17.252.60 0.00 1.000 1.000 0.100 (A -> B)
10.63.24.253 10.209.138.103 0.00 0.890 1.000 1.123 (A -> C)
10.62.24.253 10.176.139.122 0.00 0.862 0.925 1.252 (A -> RF remote node)
The dtdlink directional path from 10.209.138.103 -> 10.63.24.253 is showing the 89% LQ.
Keith, I'd inspect cabling and ports between the toughswitch and this node 10.63.24.253 Dtdlink.AI6BX-8-PBE-M5-P2P-HC.local.mesh as the next step. Shielded? Corrosion in the ports/connectors? crimping good?
10.209.138.103 is dtdlink.AI6SW-8-RM5-XW-Omni.local.mesh
Joe AE6XE
These newer devices with the Gigabit ports are at risk of bouncing around on the rates and causing havoc. The PBE-M5-620 may also exhibit this symptom, although I have one at Peasants Pk on an 8-port ToughSwitch. This combination is working fine with no special settings.
Andre, Conrad, did ether of you create a ticket earlier on this issue from Elsinore experience? Or do we need to submit this now?
Joe AE6XE
Glad to know we are all human and occasionally miss something in our reading. Good to know too that my initial thought was not that far off. :)
Andre