I would like to request a feature. The HTML links generated by the built-in status webserver on port 8080 should refer to the IP addresses of other nodes rather than their names. This will make it much easier to use from a web browser on a client that isn't using the DNS resolver in the node, at least until a way can be found to make mesh names resolve in the "real" DNS by placing them under a global TLD. This is an issue for me since I don't want to dedicate a client system to the mesh; I much prefer to use the same machine simultaneously on the mesh as on the "real" Internet.
The port 1978 webserver built into olsrd already does this; there are distinct links for a node's name and its IP address. You could either do the same on the port 8080 webserver, or simply show the names as you do now while having the actual hyperlink point to the IP address.
Have you tried to enable the "Disable Default Route" option?
The default route isn't the issue.
As I said before, I am unwilling to treat the mesh as a standalone, isolated network; I think this squanders a lot of its potential. I want to talk to it as though it were part of the "real" Internet. I don't want to have to unplug (literally or figuratively) a client from one network and plug it into the other.
Yes, I know that the mesh's use of RFC-1918 private address space, e.g., network 10, is another stumbling block to seamless Internet integration but IPv6 will solve that particular problem.
My home network is too complex for a garden-variety SOHO router/gateway so I custom-configured a small Debian Linux box named maggie.ka9q.net as my primary home router. I've got two upstream ISPs, AT&T Uverse and Time Warner Road Runner, and five outbound paths, two IPv4 and three IPv6: native Uverse, native RR and a Hurricane Electric IPv6-in-IPv4 tunnel. (This took a lot of policy routing work.)
Maggie runs olsrd so it appears as a native mesh node (not a HNA host) on the San Diego network map. This makes it easy to treat the mesh as a sixth outbound "ISP", albeit without a default route (it only gets network 10). With NAT and a web proxy cache in place, I can talk to any reachable network node from any of my local clients (e.g., my Mac laptop) without having to configure them in any special way.
Except for DNS resolution. maggie runs ISC bind, a standard DNS server and resolver. I do plan to hack it to recognize the .mesh pseudo-TLD and serve DNS queries from the entries cached in /run/hosts_olsr (available because maggie natively runs olsrd). This will solve the immediate problem, provided all local clients use it as their resolver instead of doing their own resolution or using an ISP's or Google's public resolver. That can only be fixed by putting the mesh namespace under a real TLD.
But it still seems like a very straightforward change to have the Ubiquiti firmware webserver running on port 8080 produce hyperlinks that refer to the IP addresses rather than host names in the .mesh pseudo-TLD.
By all means keep host names in the visible link. I.e., the HTML snippet would look like this:
<a href="http://10.18.74.69:8080/">kg6equ-mchs.local.mesh</a>
Or do what the olsrd-provided server on port 1978 already does: provide two side-by-side links, one with the name and one with the IP address:
<a href="http://kg6equ-mchs.local.mesh:8080/">kg6equ-mchs.local.mesh</a> <a href="http://10.18.74.69:8080/">10.18.74.69</a>
Since hostnames are already flooded to every node in the mesh, I really can't see a downside to this.
DNS Functioning has been considered a core requirement to have a working deployment. Ones PC is not considered 'working' on the mesh unless it can resolve the names. In order to allow mesh based hardware to move around and change from one hardware to another (different ip) one should not rely on IP addresses. In addition if we move to IPv6 names become a mandatory as IP will be way to long and hard to remember.
The dns space is intentionally not in the GTLD structure and I don't think there will be a time it will be inside the GTLD structure as its intended to be an isolated network that is not dependent upon outside resources nor influenced by outside resources. No different than .local has been on many windows domains for decades. We also don't ask Google to "give me an IP link to click on" I'm not sure that this request is much different.
I've been running with "Disable Default Route" for quite some time now and am able to be multihomed between my traditional home network and my mesh network DNS on board the mesh node responds with "refused" when it can't connect to other DNS servers which is exactly what we want so public DNS will pick up the missing responses. Windows handles this flawlessley by itself, Linux one needs to often add it to the resolve.conf manually if they need to be split brain. This is a spot Micorosft got right that each interface has its preferred dns servers and to my knowledge if the domain is in the "search root" of the interface that means it will explicitly send it out that interface (eg all local.mesh go out the mesh wired interface because the interface is on the sane domain)
FQDN was added to help windows decide what interface to use (and deal with locked down laptops) and disable Default Route was added after we went FQDN because of this exact capability of Windows to handle routing correctly and easily.
Linux could use an update to that in its dhcpd config file that handles how information is parsed to add it as a 'local forward' override to the localhost dns server. It would be fairly easy to write, its just getting it adopted into main code line may be the issue.
Of course if you have the "Wan" port of the mesh node plugged into your lan or the internet this becomes somewhat moot, the mesh node will automatically forward your LAN (but not wifi/mesh originated) packets out the "wan" interface and out to where they need to go seamlessly.
Placing the mesh namespace under the "real" DNS root does not mean making the mesh dependent on outside resources. If it did, I'd oppose it too. I'm not proposing any change in the way the mesh distributes name/IP address pairs, at least not at the moment. I'm only saying that it should also be possible to resolve mesh names through the regular DNS, assuming it's accessible and working. This will make it much easier to integrate the mesh into existing networks, including standard DNS servers, resolvers and caches. Just replace ".local.mesh" with a suffix known to the "real" DNS.
The .local pseudo-TLD is not apropos here because it has a completely different implementation and purpose. It uses multicasting specifically to avoid any infrastructure beyond the direct layer 2 link between the requesting and target hosts. No DNS servers, resolvers, caches or even IP routers are needed (the 224.0.0.251 and ff02::fb mDNS multicast addresses are never routed). It'll work even when you connect two offline laptops by a direct Ethernet cable.
Mesh nodes already contain DNS resolvers, so all they're really missing is AXFR (zone transfer) support. Then it would be easy to configure an existing DNS server (e.g., running ISC bind) as a secondary, periodically transferring the zone from a mesh node and serving it to the global DNS. You can have any number of secondaries, and every node in a given mesh has a copy of the same host table so different secondaries could pull the data from different nodes for robustness and load-sharing.
I've never heard of interface-dependent DNS resolver selection but it's such a bizarre layering violation that it doesn't surprise me that Microsoft would do it. You can't pick an outbound interface until you know the IP address of your destination, which you don't know until you ask a DNS resolver, which you can't select until you know your outbound interface!
DNS is not one of my strengths, but it seems to me while there are related issues here, we have 2 key ideas that can be individually focused on:
1) how DNS architecture on the mesh aligns with the global internet
2) devices on the mesh (and devices reaching in) clicking on URLs that show hostnames, but jump direct to the mesh's IP address.
Speaking on #2, Wouldn't this be an issue (to use IP addresses in the URL) if we wanted to implement an ability to do load balancing of a service or similar at the DNS level? We're not there today, but I think we'd all hope to be there in the future.
Speaking on #1, Having a disconnected mesh network, IMO, should be an independent choice for mesh administrators. This is a different discussion, but assuming mesh admins do connect their mesh to the internet to reach agencies and services that benefit an incident, there would be value to have name resolution back into the mesh. For example, during an incident the mesh might bridge a disaster area (total communications outage) and give the HQ of Red Cross an internet address to reach into the local Red Cross site command center vehicle with high speed data access. I can see where, due to security concerns, that this should be traffic initiated from the mesh going out. However, there are likely ways to ensure security to enable incoming traffic as well.
Joe AE6XE
"Speaking on #2, Wouldn't this be an issue (to use IP addresses in the URL) if we wanted to implement an ability to do load balancing of a service or similar at the DNS level? We're not there today, but I think we'd all hope to be there in the future. "
Strictly speaking, that ability exists today IF you understand the ramifications of doing so.
We don't have a "best route" selection built into it (but DNS names wouldn't know this anyway but we could put that into the dns server maybe) but if one sets two address reservations with the same name (bo not set the node to the same name, use an advertised reservation) you can actually load balance distribute a service across the network today.
Regarding your question about load balancing, it depends on what you're balancing. Servers? Network links?
Assuming the former, there are several ways to do it. If you do a DNS query for a very popular site like www.google.com, you'll get a set of short lived (3 minutes -- I just checked) IP addresses that change from time to time. (Their order is also supposed to be randomized, though I don't know if that's always done). This is how Google distributes its heavy load across a large set of servers (which must be rigorously identical). It's also how they steer you to a server close to you, similar to the way Akamai does it, although this is technically an abuse of the DNS because it was intended to give the same answers no matter where you ask the question. (This can cause problems when you use a third-party DNS resolver, e.g., Google's at 8.8.8.8).
This can't be easily done in the mesh as it is currently designed. Name/IP address pairs are flooded throughout the network on routing updates, and each node caches them. There doesn't seem to be an explicit time-to-live field as there is in a DNS record to clear obsolete entries out of a cache. But it would be possible to flood multiple (relatively fixed) IP addresses for each name provided the local resolver (or host) selects one at random.
But there's another really interesting way to do load balancing: anycasting. It only works for UDP services like the DNS but it can be leveraged for others. An 'anycast' IP address looks like an ordinary (non-multicast) IP address, but more than one server advertises it into the routing tables. The standard routing algorithm automatically picks the nearest server, something that doesn't happen if you just pick from several IP addresses at random.
This is how Google implements their public DNS resolver; 8.8.8.8 is an anycast address (so is 8.8.4.4). If you traceroute to it from San Diego, you'll get a server in LA. If you traceroute to it from (say) northern New Jersey, you'll probably get one in or near New York City. This automatically distributes the load on the servers and minimizes the load on the network. It also has the nice property of mitigating denial-of-service attacks, since an attack will only hit that nearest server. Several of the root DNS server addresses are also anycast, for the same reasons.
I don't think it should be terribly difficult to do anycasting in the mesh network, though I'd have to look at the details of olsr and run some tests to make sure.
I consider it a serious mistake to deliberately put the mesh namespace outside the "real" Internet namespace. An isolated network is far less useful than a connected network, especially for emergency public safety use but also for routine ham use.
Joe's point is very similar to what I've been thinking and saying for a while. It's all about integration. It doesn't matter how big, fast and reliable we make our mesh network; if it's not carefully integrated into a public safety agency's existing operations, it won't do them any good when it counts.
Local governments have become as dependent as anyone on "IP dialtone", transparent generic connectivity to the global Internet -- including its namespace. Therefore, IMHO, the single most important service we can provide them is to bridge commercial facilities knocked out by a disaster. If the physical links are in place, a public safety agency should be able to simply "turn a switch" (if even that) and regain Internet access using amateur facilities to reach some distant part of the "real" Internet that's still working. Ideally there should be no visible differences between normal and emergency operation, though some (like data rate) will certainly be unavoidable. Any concerns about security should be handled with administrative options, not hardwired into the architecture.
There is no guarantee there will be internet available on the mesh, that is something you will need to speak to your local mesh deployment about as there is a good chance the mesh may NOT have internet available to it by policy (I've seen a lot of mesh networks choose not to allow internet)
Ignoring that for a minute, the level of integration your talking about means an AREDN node would be deployed somewhere in the headend architecture of the facility (either an isolated office, or at the core of the network) it wouldn't be a "show up and everything is normal" deployment.
With that sort of integration being talked about you would be HIGHLY integrated into the network and would be part of the architecture, an AREDN node becomes a 'modem' at that point.
The network would already have as follows for its day to day operations:
1) Client PC's on the network to use existing DNS servers.
-- Those DNS servers could than be configured to forward all local.mesh to an AREDN node, the name space is now fully integrated to the environment.
--- The con of relying on the PUBLIC space means that if internet fails your 100% out of luck zero resolutions occurs, you also need a public single failure point to consolidate all "local.mesh" from across the globe into the public namespace. For mesh to work "when all else fails" it NEEDS to be out of the 'normal' name space to be protected (well ok it could be in the normal name space and perform the same configuration as above, however there is no advantage to doing that it still requires the same level of configuration on the internal dns infrastructures) if you rely on the public infrastructure the minute the internet goes down your out OR worse yet, a simple DDOS on the public servers or a failure on the public servers that have to consolidate all the internal mesh names (making internet now MANDATORY for every mesh node to report its name) would cause advertisements to fail and you loose resolution.
2) Your going to be doing an IP translation for those devices as the IP space is only a MAX of 13 devices inside of an AREDN node.
3) A served agency likely has a heavy dependency on its internal phone system for "ip dial" you would actually be integrating with the PHONE server at that point to choose AREDN as a network route, not the phones, as all phones are always configured to contact a central point (the PBX). Most served agencies still (under my last check) used "in house" pbx's, this is especially true of an EOC who needs its internal phones to work w/o internet.
I could go on about the level of integration you would be doing, the point is all these items would happen as part of a setup to integrate AREDN as a redundant data path into the deployment.
A quality network engineer would probably understand the concept if you tell them "treat it like a cloud where you have only a few IP address you can NAT to put it at the gateway" they could integrate it into the network.
In those cases also though the "local.mesh" namespace becomes irrelevant the served agency wont be using the 'local.mesh' name space if they are relying on their existing infrastructure, its only when the served agency decides to serve content over the mesh that they would than have content in the local.mesh.nteg
BTW: if you do integrate at the head end be prepared to some how meet PCI/HIPPA/GLBA/Sarbanes-Oxley/Various DoD Security Standard (for an EOC where your in the core), FBI Standards, etc.
Source:
10 years on the Job Experience, currently holding position of Senior Systems Engineer at a Network Security VAR serving GHE(Government, Health, Education), Federal, DoD, Financial, Commercial, Non- Profit, etc
Deploying network security solutions from desktop to gateway including endpoint protection, dlp, legal compliance, encryption, filtering, firewalls, IPS, etc
I absolutely agree that the mesh itself must be able to run without depending on the "real" Internet in any way. That's a given for any kind of emergency communications. (Tunnel links are OK if the network has enough Part 97/Part 15 links to maintain connectivity and performance if the tunnels are lost.) But as I explained, simply being under the global name space does not imply being dependent on external servers. But it can make integration and routine operation a lot easier.
The DNS is a highly distributed database, specifically designed to enable lots of caching and replication for both performance and robustness. Although I would locate any primary DNS server for a mesh network on or near the mesh itself, a secondary DNS server can still operate after being cut off from its primary; it simply won't get any updates. You specify all these timeouts in the SOA (start of authority) record for the domain. E.g., the SOA record for ka9q.net
ka9q.net. 21600 IN SOA maggie.ka9q.net. karn.ka9q.net. 2015061503 86400 7200 2419200 600
says that the SOA record itself is valid for 6 hours (21,600 s), that secondaries should refresh their zone files from a primary every day (86,400 seconds), that if they can't reach a primary they should retry every 2 hours (7,200 s), that a secondary database is valid for 28 days even without refreshing from a primary, and that the default time-to-live for each record is 10 minutes (600 s). This is also the time that a NXDOMAIN (non-existent domain) lookup will be cached. These values are somewhat arbitrary but they're easily changed.
With my method of extracting a zone file from /run/hosts_olsr, it's easy to make any mesh node into a primary DNS server so we won't lack for them. The only remaining problem, and I think this is the one you're talking about, is ensuring that any resolvers used by clients continue to know about all these nearby servers even if their normal Internet connectivity is cut off and they're unable to get the usual NS glue records from the higher level servers. I think that's a manageable problem as long as we're aware of it.
As well as, it will be fairly visible if any encrypted traffic is attempted across the mesh.
Ie. someone trying to access IMAP mail with a secure connection, any website with HTTPS, etc...
As far as I'm concerned, the ham rules already say that pretty much anything goes in a bona-fide emergency. So I see no problem with encrypted traffic.
I also suspect that our "customers", the public safety agencies, would be significantly less interested in what we have to offer if we make them jump through all sorts of unnecessary hoops. Especially in an emergency!
I've already made a recommendation as to how to deal with encrypted traffic in routine ham operations. The IP (and IPv6) headers contain a 6-bit Differentiated Services Code Point (DSCP) for senders to notify routers of some important property of your traffic. Specific values are defined and interpreted on a network-by-network basis. DSCP is most often used to indicate priority (e.g, interactive, bulk, or scavenger) or delay sensitivity (e.g., VoIP) but nothing keeps us from defining a codepoint (or a bit within each codepoint) that means "non-emergency encrypted traffic, handle by non-ham channels only".
Or more to the point, "not legal under Part 97, handle by non-ham channels only". That could include unsupervised third-party traffic, encrypted or clear.
The routing code in the Linux kernel already has the hooks needed to handle DSCP, including the ability to select a different routing table. If olsr topology control updates were to flag each link as "ham" (Part 97 rules) or "non-ham" (tunnel, ethernet, Part 15, licensed commercial radio) then each mesh node could keep two distinct routing tables: a normal one and a second one that ignores all Part 97 links. Traffic marked as encrypted would use the second table. Of course, many destinations will probably be unreachable in the second table, but them's the rules.
Why not just use iptables filters? Many good reasons, the most important being clean layering. I'm not being a purist; long experience has shown that while you might get away with a layering violation for some time, eventually it almost always bites you in the ass. The Internet protocols were specifically designed to put everything a router needs to do its job in the IP header. The unfortunate ubiquity of router packet filtering and deep packet inspection notwithstanding, routers have no legitimate reason to look at transport (TCP, UDP) or application data unless the router itself is the destination!
It makes much more sense to tell a ham meshnet router exactly what it needs to know (is this packet legal under Part97?) than to force it to guess by snooping at port numbers. Transport layer port numbers are just a convention; consenting hosts are fully entitled to run IMAPS or SSH on port 12345 rather than 993 or 22 if they so choose, and a port-based router filter would miss this. Traffic on the "standard" ports would still be blocked even in an emergency. So would ssh on port 22 even if it were modified to only authenticate without encrypting.
Depending on the network topology, there might be an alternative, non-ham path to a destination that, while inferior to one using Part 97 links, could be used for encrypted traffic instead of blocking it entirely. But this could never work with port filtering; a separate routing table would be the only practical way.
Last night I whipped up a simple Perl script that runs on maggie.ka9q.net, my home router. Because it runs olsrd as a "native" mesh node, it has its own copy of /run/hosts_olsr, the current host table built by olsr flooding. My script reads the table, builds both forward and inverse (PTR) zone files, and kicks named to reload them. Seems to work fine, provided my client uses maggie as its DNS resolver. (Putting the mesh under a "real" TLD would remove this restriction.) At the moment I'm only building the 10.in-addr.arpa. reverse domain but I can add support for 172.31.0.0/16 fairly easily. What other subnets are in use?
But I think I can do better. DNS servers like ISC bind accept dynamic DNS registrations, so it should be possible to extract the host/IP address pairs from the routing updates and register them in the DNS in real time. That would minimize the delay before a new host appeared in the DNS; with my Perl script you have to wait for the next invocation for the zone file to be updated.
I could either modify olsrd to do this or write a separate program to monitor the olsr routing updates and extract the information from them.
Or you could do like I suggested in post 9 and get immediate results all the time by using a forwarder entry which allows you to integrate seamlessly without delay or hacking by pointing to an official mesh node built in DNS server
I had thought about that, but I really like caching and replication for performance and robustness and it's somewhat difficult to do this way. You return DNS records with TTL=0, so bind can't cache them. Every request still hits your firmware, and I prefer not to hit small CPUs too heavily when I can avoid it.
By setting up my own primary authoritative server with its own zone file, I can set the zone timeouts to whatever I like. Since my DNS server runs olsrd natively, it doesn't need to query your firmware at all; it gets everything it needs from the usual router updates. I can also replicate this on as many nodes as I like, although it's true that you could also divide the forwarded load among several mesh nodes if you happen to have more than one.
I noticed that your default iptables configuration blocks DNS queries except from the local HNA (LAN). I'd already blown away all those filters (by modifying /overlay/etc/config/firewall) so it still worked when I forwarded the query to your firmware over VLAN2 from my router maggie. It would be nice to permit remote DNS queries, that can come in handy when trying to debug a problem or simply track a new name as it floods the network.
For some strange reason, every forwarded request to your DNS server causes bind on Linux to do a fresh query of the DNS root and to return a complete list of root servers as additional records in the relayed response. I checked -- they're definitely not in the records returned by your DNS server. It could be that this forwarded zone feature isn't heavily used. There's a "hint" feature very much like it that primes the list of root servers when bind first starts up, and maybe that explains it.
But there's a more serious problem. The symptom is that the forwarding works for a while, then inexplicably begins returning NXDOMAIN (nonexistent domain name) errors. I found out why when I got home.
As long as I only perform A (IPv4 address) queries, all is fine. But if I happen to send you an AAAA (IPv6 address) query, which can easily happen because I run dual stacks, you return a REFUSED response. This causes bind to go query the "real" DNS despite the forwarding entry, and since .local.mesh doesn't exist there, the "real" DNS returns a NXDOMAIN error which bind relays to the client. It also (correctly) caches it for some time so future queries, even for valid A records, continue to return NXDOMAIN. This is exactly the kind of hazard of a non-existent TLD that I was worried about. It does work when I run my own primary for local.mesh because, being authoritative, it never queries the external DNS. It also knows how to handle non-A queries correctly.
When you get a query for a non-existent record type for an existing domain name, the correct reply is an empty response with no error indication. If the name doesn't exist at all, return NXDOMAIN regardless of the query type.
Do you really expect the format of /run/hosts_olsr to change? It's in 'host table' format, which is truly ancient -- we used it in the mid 1980s before the transition to the DNS. I expect each line to begin with an IP address in dotted-decimal (or hex, for IPv6) format, then arbitrary white space, then a host name, then optional additional white space. Everything after a '#' is a comment.
Is that likely to change?
I have written a simple independent DNS that runs on port 53 of a 10.x.y.z connected server machine, and provides authoritative resolution of names in the .arn (Amateur Radio Network) tld only. It gets its data from a local file that looks somewhat like a hosts file. Although nodes provide resolution for the .mesh tld, they are based on complex node names, and the host name of connected machines which are likely non-obvious. Might such a service be useful across AREDN?
Was there an issue with ".arn" as TLD or am I not remembering that discussion correctly?
BTW this has been a very useful thread, thanks for finding it and posting here.
I expect I'll eventually reread all of it looking up the references that elude me.
73, ...dan wl7coo
I think It does make sense and it might be very useful in light of the discussions above and on the side.
Here's hoping.
73, ...dan wl7coo
Adding an external DNS server to an area to serve another gtld would require local users to configure their computers to use these DNS servers AND would create failure points if the server can't be reached across the mesh. I'm aware of some corporate computer policies that would actually forbid this so you could have a local laptop from the Red Cross shows up and can only get to those items avaliable under local.mesh because those DNS servers are set by the IP lease and they have no ability at all to reconfigure the server settings.
The on node DNS has been designed to always reflect the network and it "just works" without a need for backend configuration of the computer and that updates with the network. You will still have the same namespace issues of "what do I call it" if you try and move to a new gtld as if you stay in the local.mesh space of the node.
Just some thoughts to keep in mind.
another thing that just works (with Windows) - as long as you have distinct network address ranges - is to use 2 or 3 separate NICs. Windows manages to go to the right network as a function of the address or name somehow ... here I have the 10.x.x.x and 44.x.x.x and my 192.168.1.x all connected at the same time and it just works. Actually I would like to have TWO independent 10.x.x.x networks but I have a feeling that is not possible.
It works great.
(and the second NIC was like $12 - Netgear Gigabit.)