You are here

RB952Ui-5ac2nD Stealth Bug

26 posts / 0 new
Last post
AC0WN
AC0WN's picture
RB952Ui-5ac2nD Stealth Bug

My apologies if this has been reported elsewhere.

Unit:  MikroTik hAP ac lite router operating normally as a stand alone tunnel client node using firmware build 614-dbfef25 and operating 13 Host Direct.  POE and USB power OFF, and the radio operating as a 2.4 GHz AP.

Accessing this node across our network from another tunnel client I changed the AP from 2.4 GHz to 5 GHz channel 157, saved changes, and rebooted the node.  The node failed to reappear on the mesh status page.  I requested the node owner to reboot his node, which he did, and still the node failed to appear on the mesh status page.  BUT … from the perspective of the node owner everything was normal.  He could see the entire network from his node, his voip phone worked fine and could use the services of the PBX.  He had access to all network services.  No other node on the network could see the “stealth” node, but the node could be pinged, and the node’s status page could be called up in a browser using its IP.

After flashing the node with nightly build 713 the problem cleared and the node was returned to normal operation.

A data dump is attached.

73,
julie
ac0wn

AA7AU
AA7AU's picture
MikroTik hAP ac lite running 572

As another data point (but a bit fuzzy): I have a MikroTik hAP ac lite router running 572 (first and only install) in a similar configuration using the 2.4 as AP with tunnel installed and active. I have had some issues with rebooting the node where it seems like it doesn't properly fully reboot unless the LAN ethernet connection(s) are disconnected for the boot. Haven't reported it as a bug because I wasn't paying close enough attention to the details when it happened and I'm now trying to keep it stable for regular use.

I love this little device and hope that we will sometime be able to use the 2.4 or 5 radio to connect the WAN side outbound. Also will be good to be able to use the USB connection for networking as well.

This unit is (or will be) the perfect basis for a Go Kit! In the meantime, I don't bother it much with reboots.

Good luck,
- Don - AA7AU

K5DLQ
K5DLQ's picture
Update to the latest nightly
Update to the latest nightly build and retest.  572 is pretty old now.
 
AE6XE
AE6XE's picture
I was in a hotel last week
I was in a hotel last week and configured the hAP ac lite to do a WAN AP client connection to  the hotel AP.   I was happily tunneled to the SoCal Mesh network and had a couple of laptops on the LAN ports of the hAP ac lite.   Both worked and the hotel only sees it as one device :) .    I coded the UI for this setting on the airplane.  Just need to run it live, test it, and work out the issues.   What's the saying, necessity drives invention :) .

hAP ac lite, the swiss army knife of mesh nodes...

Joe AE6XE
al0y
Joe, 
Joe, 

I love this. Being able to use one of the RF radios on this little router board as an uplink is just great. 
Is this coming in any nightly build soon? 
If so, I will wait. Otherwise, I would really appreciate if you share how was this achieved with us. 
AE6XE
AE6XE's picture
Getting close...
Getting close...
Image Attachments: 
AE6XE
AE6XE's picture
The symptoms you describe, if
The symptoms you describe, if I understand correctly, are the same as this bug:  https://github.com/aredn/aredn_ar71xx/issues/204

A change was made in a recent build which has ether reduced occurrence or prevented these symptoms from showing.     If you can reproduce this bug with a current build, please post a support download from the node in question.   We hope to never see these symptoms...

Joe AE6XE
AC0WN
AC0WN's picture
Clarification

Hi Joe,

Yes I was aware of the bug you mention and have seen that one it a number of times when it only displays the IP.  I thought this one was interesting in that it displayed nothing! ... No hostname, no IP .... and yet it had full network access.  :)

73,
julie
ac0wn

AE6XE
AE6XE's picture
got it.   I'm in transit,
got it.   I'm in transit, back home tomorrow night.   Will be able to look at the support dump first of the week.  The support dump is on the node that others do not see in mesh status OR is a support dump of a remote node that does not see the IP address.  (it's helpful to have a support dump from both sides of the fence.)

Joe AE6XE
AE6XE
AE6XE's picture
julie,  following up on this
julie,  following up on this issue.  After upgrading to a current nightly build, were any issues?   I'm assuming no news is good news and all is well?

Joe AE6XE
AA7AU
AA7AU's picture
I upgraded to Nightly Build 713

I upgraded to Nightly Build 713 on my MikroTik HAP and am still seeing the problems I referenced above.

In short, running the unit with the Mesh RF *off* (and using the 2.4 as AP) with an active ethernet WAN cable connection, if I use the GUI to reboot (no matter the change), it will stop as normal, start, flash the front 2 lights, but then nothing else, and hang indefinitely (hard power cycle only way out). It does NOT do this if the ethernet cable is POE power only! Will do some more testing later, but going to the most recent build did not fix the problem.

TIA,
- Don - AA7AU

 ps: Joe, I like that you have adopted my "Swiss Army Knife for the Mesh" slogan for this terrific device. I go to bed each night with a prayer on my lips that you will soon release the WAN AP client connection enhancement ... and maybe even a similar approach to using the USB connection. Thanks for all that you do.
 

AE6XE
AE6XE's picture
Don,  Unfortunately, I'm not
Don,  Unfortunately, I'm not able to reproduce.   Here's the specifics of my test case:

1) hAP ac lite is powered with 12v on 1/8" jack
2) only a WAN cat5 to home network (no POE power to the mikrotik on this cable)
2) Mesh RF off
3) 2GHz LAN AP on and connected with an ipad to do 'reboot' button on setup page
4) no other cat5 cables connected
5) also accessed from home network over WAN port and did a 'reboot' too.  

Anything I'm missing?

Joe

"The swiss army knife of nodes" (tm - Don AA7AU)

 
AA7AU
AA7AU's picture
Perhaps it's a hardware problem?

Just got back from a mtn top, where as usual the HAP was a terrific help. I had it setup with the 2.4 Mesh radio on, the 5.8 AP on, the tunnel still installed but nothing enabled, and meshchat still installed.

flash = 8084 KB  /tmp = 30208 KB   memory = 26984 KB  firmware version 713-f833b38

Back here now, I connected as follows: Using the comes-with 24v wallwart into the barrel connector, a normal ethernet cable (no POE) from a D-Link green switch (actively connected to router) into the WAN port, and a single ethernet cable into the first LAN port from Win7 laptop. Power-up stalls with two front lights on and nothing else (waited long past standard 25-second pre-POST wait).

Disconnect power, remove the WAN cable, reconnect power and it boots right up. WAN connection works fine on cable re-insert.

I'm starting to think that I have some sort of unique hardware problem as this is NOT related to turning off mesh radio or any other unusual settings. Maybe that "green" switch needs an A/B comparison just to rule it out.

Thanks for all your help in this,
- Don - AA7AU
 

"Yeah, we can probably make that work" (tm - Joe AE6XE)

AE6XE
AE6XE's picture
:) 
:) 

Might be some interaction between your D-Link switch and the hAP ac lite?  Daisy chain a dumb switch in the middle and see if it goes away?   cat5 or connector shorts on the POE pins, swap out cable?

Joe AE6XE
AA7AU
AA7AU's picture
Next steps ...

OK, first I simply swapped the cable, no luck. Then I ran a cable direct to the router by-passing the "green" switch, no luck. (Assuming that the reboot would show as active with flashing lights around 25-30 seconds later).

Then I tried to find a dumb switch (vanishing breed here) and found an old Netgear EN104tp *HUB* (they don't get much dumber than that). Hooked it between MikroTik WAN port and switch, NO LUCK. But there seems to be one hell of a lot of traffic outbound trying to do something out that ethernet connection.

What the heck is that WAN port trying to do before/during POST?

Wait ... as I was typing this (long after the obiligatory 25-30 second startup hiatus) it finished rebooting - after a long session of blinking lights on the hub ports.

I seem to recall that I have waited extremely long times before with no luck doing this previously. What is the maximum tine-out for POST et al?

TIA,
- Don - AA7AU

AE6XE
AE6XE's picture
This is good info.  Do a
This is good info.  Do a command line to the Mikrotik.   run the command "rbcfg" and see the help and options to set bootloader parameters.   There's a couple that might be impacting:

1)  boot_delay: from 1 to 9 seconds
2)  boot_device: check options, may be trying to boot over ethernet first, then from flash

On my Mikrotik:
"rbcfg get boot_delay" shows "2" 
"rbcfg get boot_device" shows "nandeth"

Then do the "rbcfg set <option> <value>" accordingly.

Joe AE6XE
AA7AU
AA7AU's picture
Nothing obvious

root@AA7AU-hAPaclite-rover:~# rbcfg get boot_delay
2
root@AA7AU-hAPaclite-rover:~# rbcfg get boot_device
flash
root@AA7AU-hAPaclite-rover:~# rbcfg get boot_protocol
bootp
root@AA7AU-hAPaclite-rover:~# rbcfg get booter
regular

Not sure what, if anything, to change.

TIA,
- Don - AA7AU
 

AA7AU
AA7AU's picture
ifconfig ?

root@AA7AU-hAPaclite-rover:~# ifconfig
br-lan    Link encap:Ethernet  HWaddr CC:2D:E0:C2:E1:CD
          inet addr:10.30.28.209  Bcast:10.30.28.223  Mask:255.255.255.240
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3697 errors:0 dropped:414 overruns:0 frame:0
          TX packets:1776 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:525736 (513.4 KiB)  TX bytes:264976 (258.7 KiB)

eth0      Link encap:Ethernet  HWaddr CC:2D:E0:C1:E1:C7
          inet addr:192.168.205.71  Bcast:192.168.205.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20719 errors:0 dropped:420 overruns:0 frame:0
          TX packets:875 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1627694 (1.5 MiB)  TX bytes:62717 (61.2 KiB)
          Interrupt:4

eth1      Link encap:Ethernet  HWaddr CC:2D:E0:C2:E1:CD
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3697 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7949 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:592282 (578.4 KiB)  TX bytes:963120 (940.5 KiB)
          Interrupt:5

eth1.0    Link encap:Ethernet  HWaddr CC:2D:E0:C2:E1:CD
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3697 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1776 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:525736 (513.4 KiB)  TX bytes:264976 (258.7 KiB)

eth1.2    Link encap:Ethernet  HWaddr CC:2D:E0:C2:E1:CD
          inet addr:10.194.225.205  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3077 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:332498 (324.7 KiB)

eth1.3975 Link encap:Ethernet  HWaddr CC:2D:E0:C2:E1:CD
          inet addr:10.193.225.205  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3096 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:333850 (326.0 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:11926 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11926 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:717109 (700.3 KiB)  TX bytes:717109 (700.3 KiB)

wlan1     Link encap:Ethernet  HWaddr CC:2D:E0:C1:E1:CD
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1902 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:460391 (449.6 KiB)

- Don - AA7AU

AE6XE
AE6XE's picture
Nothing in the settings is
Nothing in the settings is jumping out that might be a problem.  shot in the dark, could change the 'boot_device' to match what mine has "nandeth".   However, I should probably change mine to match yours, "flash".  This device doesn't have "nand" flash, rather it is "nor".   The setting I'm using says look to boot on the "nand" flash, then if unsuccessful, try to boot from the "eth" or network.  It still figures out how to boot the nor flash...

Joe AE6XE
K3PGM
K3PGM's picture
boot_device=flash => slow boot

Joe:

FYI, I recently tried to upgrade a hAP ac lite to v3.20.3.0, but the upgrade process appeared to fail, and I was left running 3.19 again. I then did a fresh install of 3.20.3.0 using PXE, which took a very long time but eventually succeeded. Afterwards, the device behaved normally except that it was taking about 5 minutes to boot.

I thought it might be a hardware problem, so I performed another installation of 3.20.3.0 on a second hAP ac lite fresh out of the box. Again it succeeded, but with a similarly long boot time.

After finding this thread, I checked and saw boot_device=flash on both devices. I changed them to boot_device=nandeth and now they boot in about 30 seconds.

The help message for rbcfg describes the "flash" option as "boot in flash configuration mode". I suspect that "flash configuration mode" is some sort of maintenance mode that is eventually timing out and booting the existing content of the NOR flash. And that "nand" actually refers to the NOR flash if that's what you have...

- Paul, K3PGM

AE6XE
AE6XE's picture
Was the device purchased and
Was the device purchased and received with this boot_device=flash setting?   If so, we may need to document to change this setting in the installation instructions to avoid others stepping in this pothole.   There are 2 partitions in flash, "routerboot" and "routerboot2".   As I recall, only one of these can be upgraded to newer versions and the other is intended to never change such that the device can always be recoverable.  There's some vulcan pinch to choose which routerboot to use.  Maybe this is related to these boot settings.

Joe AE6XE
K3PGM
K3PGM's picture
FYI: boot_device=flash setting on new hAP ac lite

Joe:

I just pulled another brand-new hAP ac lite off the shelf (where it's been sitting since March) and loaded the latest AREDN firmware. And yes, it had the boot_device=flash setting. So, indeed, at least some of them were shipped that way, unless something about the AREDN installation process is doing it.

I bought this unit on Amazon, sold by EURO DK LTD, on 10 Mar 2020.

73,
- Paul, K3PGM

K6CCC
K6CCC's picture
Maybe similar issue with hAP

I may be having a similar issue.
Four days ago I received a hAP ac Lite and flashed it for AREDN 3.19.3.0  I found that sometimes it would not boot.  Took a few times to find the pattern, but I found that with 100% reliability, if port 5 (DTD link) is connected, the hAP goos into a continuous reboot cycle.  If port 5 is not connected, it always boots properly.  For all tests, the WAN port was connected, and I did test with a LAN port both connected and not connected (made no difference).  After reading the first part of this thread, I updated to 883-71325a9 and found exactly the same condition.  I have uploaded three support files - I THINK I have accurate descriptions...
If preferred, I can start a new thread...

Jim K6CCC
 

AE6XE
AE6XE's picture
Jim, sorry, I've not soaked
Jim, sorry, I've not soaked in the history on this thread, but just in case a thought came to mind on the symptoms you are seeing.   The hAP will default with POE passthough on Port 5, until AREDN boots linux, then applies the AREDN  POE pass through setting for that port.    Is it possible, that the cat5 and other device it is connected to is not applying the correct load to the POE passthough?   If so, this might be causing the symptoms you are seeing.

Joe AE6XE 
K6CCC
K6CCC's picture
Good point
Good point about POE.  At the moment it is just plugging into a port on my MikroTik CSS326 switch.  I will try it with some other device and also with a custom cable that does not have the wires used by the passive POE.  I’ll report back later today...
 
K6CCC
K6CCC's picture
The POE was it.
Thanks Joe.  The POE was it.  Built a custom cable that only has the two pairs used for data and it's working fine.
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer