You are here

HAP AC Lite keeps unadvertising services despite them being up

25 posts / 0 new
Last post
KI6TSF
HAP AC Lite keeps unadvertising services despite them being up

Hi,

The services I have defined on my HAP AC Lite (Mikrotik RouterBoard 952Ui-5ac2nD) keep disappearing from the mesh status page every few hours.  They run on a raspberry pi on the LAN and it is reachable from both the RF side and LAN side all the time.  "Provide default route to LAN devices" is set to ON in the Advanced Configuration panel.

This didn't happen before when my services were advertised from a LiteBeam.  Only happens with the HAP.  The firmware version is 3.23.4.0.

Any idea what the problem might be?

Thanks,

Bernard KI6TSF

K6CCC
K6CCC's picture
What kind of services?  The
What kind of services?  The newer firmware versions check to make sure the services are actually reachable, and de-advertise them.  As I recall, it will show on the node with some sort of indication (can't remember if / what), but not propagate the service.
 
KI6TSF
I have those services running

The types of services I have are http:// telnet:// and mqtt://

I have them running on a raspberry pi and port-forwarded on the HAP AC Lite.

Webcam towards Black Mtn http://ki6tsf-hap-1.local.mesh:8081/
N0ARY Packet BBS (login as user bbs then type callsign) telnet://KI6TSF-HAP-1.local.mesh:10023/
MeshChat http://ki6tsf-hap-1.local.mesh:8082/meshchat
MQTT Broker (use topic /yourcallsign/whatever) mqtt://KI6TSF-HAP-1.local.mesh:1883/
Speed Test http://ki6tsf-hap-1.local.mesh:8082/speedtest

They are up and reachable all the time in my LAN at their original ip and port, and from the RF side via their forwarded ports.

When the HAP AC Lite decides to de-advertise them, it shows an exclamation mark (!) to the right of the "Advertised Services" list in Setup.

This used to work fine all the time before when all those services were port-forwarded from another device (LiteBeam).

Bernard KI6TSF
 

K6CCC
K6CCC's picture
I'm going to be very picky
I'm going to be very picky about this question.

The types of services I have are http:// telnet:// and mqtt://

I have them running on a raspberry pi and port-forwarded on the HAP AC Lite.

You said they are port forwarded.  Normally that implies NATing.  Is the RasPi directly connected to a LAN port on the hAP and therefore has a mesh reachable 10.x.y.z address, or is it connected to or via something else (a router for example)?  Are you actually simply advertising the services or is there actual port forwarding going on the hAP?

When the HAP AC Lite decides to de-advertise them, it shows an exclamation mark (!) to the right of the "Advertised Services" list in Setup.


I knew there was some indication, but could not remember what it was. It takes an hour or two for a service to de-advertise.

The ability to check to see if a service was actually reachable was added some time last year.  The purpose was a response to the HUGE number of advertised services that were not actually there.  When first added to the nightly builds, there were some issues with certain types of services that were not detected as actually reachable, but common types such as we pages were not an issue.  As I recall, one of the first things the node tries to do is ping the device.  Ping does not recognize TCP ports, so that may be an issue depending on what you are actually doing.  Additional tests are performed if the ping fails, but I do now know details.
 
KI6TSF
You said they are port

You said they are port forwarded.  Normally that implies NATing.  Is the RasPi directly connected to a LAN port on the hAP and therefore has a mesh reachable 10.x.y.z address, or is it connected to or via something else (a router for example)?  Are you actually simply advertising the services or is there actual port forwarding going on the hAP?


Yes, you are right.  I initially forgot to enable NAT Mode on that node.  I switched to NAT mode but that didn't fix this particular issue, the services were de-advertised again this morning. 

The RasPi is connected to a LAN port on the HAP, there is no router in between.  There is actual port forwarding going on the HAP and it seems to work, I tested it multiple times from the RF side using another node, it's just the advertising that goes away after a few hours.

I will check the HAP log files if there are any errors, hopefully the advertising/de-advertising code logs some entries.

Thanks,

Bernard KI6TSF
K6CCC
K6CCC's picture
I don't know if running NAT
I don't know if running NAT mode is causing your problem.  However, is there a particular reason you are using NAT mode as opposed to "direct" mode?
KI6TSF
Yes I prefer NAT to prevent
Yes I prefer NAT to prevent direct access to other TCP/IP services such as sshd or other services I plan to set up and test in the LAN first.  Also I'm going to migrate my RPi 4 services to a RPi CM4 (inside a Seeed reTerminal which is basically the same as a Raspberry Pi but with an integrated LCD touchscreen display) and I think using NAT with port forwarding will make the transition simpler.
KI6TSF
Ok, so on firmware 3.23.4.0

Ok, so on firmware 3.23.4.0 on my HAP AC Lite, the check-services hourly crontab systematically fails.  It calls /usr/local/bin/olsrd-config which passes the pings but systematically fails at all the http checks and tcp fallback checks.  The reason is it uses the NAT hostname and ports defined in each port forwarding rule and those ports are not accessible from the HAP itself, i.e. the HAP is not able to reach any of the local ports it forwards although the port forwarding rules are active and they are reachable from the LAN.  The FW4 firewall config does not allow the device's own SRC address to access those ports.  They are all accessible from the LAN and RF-side WAN (which is DtD in my case) but not from the host itself using any of its own interfaces IP addresses (127.0.0.1, 10.x.x.x, 172.x.x.x, 192.168.x.x).  The tests in olsrd-config therefore systematically fail.  I have debugged and traced all the LUA code and confirmed that this is the issue on my HAP, and this is what causes all the services to be de-advertised while in fact they are all reachable from anywhere.

Bernard KI6TSF

K6CCC
K6CCC's picture
You are way beyond my ability
You are way beyond my ability to help.  I can spell linux and not much more than that!
Submit a bug report on GitHub

 
KI6TSF
Thanks anyways! I'll submit a
Thanks anyways! I'll submit a bug.
W4JWC
Update please
I'm curious it there has been a resolution to this. I and another ham are having the same issue. Thanks
AI4Y
Also experiencing the delisting issue

W4JWC and I have seen this delisting issue on 3.23.4.0 and 3.23.8.0 for Winlink service and AXIS IP Camera. Service resolves on local mesh nodes and tunnel. The services are running but ping and arping fail when called from /usr/local/bin/olsrd-config .  Services are port forwarded and resolvable, but are not pingable, hence getting delisted.  Services are still in /etc/config/services.

They are also in /tmp/service-validation-state and have recent epoch dates for them, but when olsrd-config executes it calls /usr/local/bin/olsrd-namechange to pull (delist) the service when it writes /var/run/hosts_olsr.stable

I noticed the LUA manger has been choking when trying to kill a process.

root@AI4Y-160-201-233:~# cat /tmp/manager.log
09/26 11:33:32: linkled: Terminating manager task: linkled
09/26 11:33:32: periodic-metrics: Terminating manager task: periodic-metrics
09/27 04:10:45: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:14:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)
09/27 04:17:50: namechange: /usr/local/bin/mgr/namechange.lua:126: bad argument #1 to 'kill' (integer expected, got nil)

The kill errors from not passing a valid pid happen when its executing the following code in /usr/local/bin/mgr/namechange.lua:

function dns_update()
    local pid = capture("pidof dnsmasq")
    if pid ~= "" then                                                                  
        nixio.kill(tonumber(pid), 1)              
    end                                 
end

KN6PLV
KN6PLV's picture
I would be good to get more
I would be good to get more information about these service delisting as ... obviously ... this isnt how it should be.
In particular, I'm interested in a setup where arping is failing and yet you can can still contact the device when you try.
AI4Y
Yes, I can still contact the device after its delisted.

I found that if I connect the device (Axis IP) camera directly to my ToughSwitch Pro where all the VLAN's are in place (1- wan, 2 - dtd, 11 untagged) and the camera port is connected to an untagged port with wan access it successfully pings from the Aredn node and no longer gets delisted. I'm not surprised honestly. I have 3 networks at my QTH and the Aredn 10. subnet has to cross into the 192 subnet to get to the Axis IP camera which is assigned a 10. IP in the same Arden subnet. Nasty I agree, so I took steps to connect it directly to the ToughSwitch.  W4JWC however seems to still have troubles with his Windows PC running Winlink service being delisted when directly connect to a MikroTil hAP ac lite (RB952Ui-5ac2nD-US)
 

K7EOK
Is this resolved? Having same problem
I have a HAP AC Lite node currently running 3.23.12.0 advertising services on LAN connected computers.  My LAN goes thru a dumb switch then provides ehernet to two laptops.  One is running Linux and provides FileGator, Citadel, and a still image from a camera.  The second laptop added recently is Windows 10 and is hosting a Winlink Post Office gateway.

The issue is similar to described in this thread.  The link in Port Forwarding for the Winlink gateway gets dropped overnight and an exclamation point appears when I investigate.  The actual Winlink service never goes down ... I can connect fine even when the HAP acts up using the previous Network Server Settings in Winlink Express and the Windows Laptop on my node works and connects.  The issue is that when I go to update the AREDN MESH Node List in add server ... as if I'm outside my node looking to connect ... it's no longer in the list. 

When I simply click "save changes" in the Port Forwarding page, even though I made no changes ... that fixes the problem and the Winlink service is advertised again.

My "Provide Default Route" in advanced settings is off and I'm running direct not NAT.

What's going on?  I'd like this service to stay up.  The service is on the computer just fine, only the HAP decides it's missing.

Ed
 
w6bi
w6bi's picture
Advertised Services
Ed, update your hAP ac Lite to the latest nightly build and see if it keeps happening.   If so, open a ticket on the AREDN GitHub instance and attach a support file.
73
Orv W6BI
 
K7EOK
Updated firmware to 20240526
Updated firmware to 20240526-78fb72b  ... rebooted etc was working.  Left for a few hours, came back and it's not linked.

DHCP lease is still correct.  Winlink Express using the previous add server works.   Ping to IP address 10.X.X.X does not return anything, nor does ping of the hostname.  I'm not sure if this should be pingable.  Winlink Express has no problem finding the service and connecting, but if I delete the server then I can't find it again in the AREDN list.

When I save changes the link re appears just like before.

Ed
 
AJ6GZ
Pings
I would allow pings at the winlink host.  That's one of the checks the node does to make sure the service is up.
 
K7EOK
How do you do that?  The host
How do you do that?  The host computer is a Windows 10 laptop.

EDIT

OK, found an article on modifying Windows Defender ... done.  Now it responds to ping.  I'll see if the service stays up.  I am curious if this does create a security issue for the laptop.  Thoughts?
AJ6GZ
Nah
Not really an issue.  You could lock it down in windows firewall to reply only to the node if you really wanted to but unnecessary... and a regular user might want to ping it to see if it's up.

 
K7EOK
Meanwhile that firewall
Meanwhile that firewall setting didn't fix it.  Service still unadvertises itself after several hours.

Ed
 
nc8q
nc8q's picture
Ed, what does your service look like? My Win10/Winlink is adver
Ed:
What does your service look like? 
My pilinbpq/WinlinkPostOffice is advertised like this:
73, Chuck

 
Image Attachments: 
K7EOK
The conversation moved over
The conversation moved over to Github for most of the day.  Tim has been working this with me, the latest suggestion he made is that winlink should not be advertised as an http service ... so it's currently advertised on my server as winlink//hostname:port and we will see if OLSR decides to kick it off again or not.  I'll report back here when there is something conclusive.

FWIW I did change the incoming IPv4 rules for both public and private networks and that did not solve anything.  I can ping the ip address and the host name from my position on the actual LAN the server is on.  Tim could not ping these from his location outside our local mesh (supernode).  I haven't yet found out if a user not on my local LAN can ping from elsewhere on the local mesh ... and expect to test that tomorrow.

Remember,  winlink has always been working.  What did not work was having the advertised service remain up ... so that when adding an AREDN winlink server using the AREDN Mesh Node List within RMS Express my node wouldn't show up.  A winlink node no one can find is fairly useless ...

Ed
 
AJ6GZ
I'll cross post this here just for reference...

The Winlink program needs a full URL to be able to find, and correctly, populate its server list.

Mine is: winlink://AJ6GZ-1-winlink.local.mesh:8772/

So the values are:
Name: Winlink Post Office   (The program searches for winlink and/or post office)
Link: [X] Checked
URL: winlink
host AJ6GZ-1-winlink
Port 8772
Nothing after the trailing ' / '

Yes 'winlink' is a bogus uri, but we don't want people clicking on http, telnet, ftp, etc or any of the other things that back in the day would launch external programs in browsers. But you must have something that looks like a "link" for Winlink to populate hostname and port# correctly. Otherwise, you'll see it in the list, can add server, but it will fail to work right.

I also allow pings to this host.

You can test by telnet/ssh'ing to your node or better someone else's node further away, then

  1. ping the host, preferably by hostname.
  2. 'telnet hostname 8772' and see if you get "callsign:" in return. Or whatever your port # is.

If these are true, the service checker and users should be happy.

Re:

" Winlink worked fine with the previous firewall rules, the only thing was it wouldn't show up in the AREDN Mesh Node list so users would not be able to find it. "

K7EOK
Well that appears to be the
Well that appears to be the fix.  The service is still advertised.

Seems like there should be a section in the Docs area of the arednmesh.org website on how to properly setup Winlink.  Thanks everyone for troubleshooting this.

Ed
 

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer