Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v055: can't resolve .local domains - never get any DNS response #1005

Closed
Rhys-T opened this issue Aug 20, 2023 · 19 comments
Closed

v055: can't resolve .local domains - never get any DNS response #1005

Rhys-T opened this issue Aug 20, 2023 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@Rhys-T
Copy link

Rhys-T commented Aug 20, 2023

Possibly a followup to the discussions about mDNS in #26:

After updating Rethink to v0551, I can no longer resolve domains in the .local TLD. Other, more 'normal', domains work fine. Previously, even though Rethink couldn't resolve .locals through mDNS (and Android wouldn't do mDNS itself since it thought it was on a VPN), it would at least pass them on via normal DNS to personalDNSFilter2, where I could manually enter addresses for them and update them when needed.

If I run dig some-machine.local under Termux, it sits there for about 18 seconds, then tells me connection timed out; no servers could be reached. Neither Rethink nor pDNSf shows that domain being requested in their logs. Wireshark (running on the machine I'm asking for) never sees the query. The Rethink packet capture does show the query being sent a few times (to 8.8.8.8 and 8.8.4.4), but no response.

(To be clear, it fails under normal Android apps too, not just Termux/Linux commands. I'm just using dig to see how it's failing.)

If I talk directly to pDNSf by doing dig @127.0.0.1 -p5300 some-machine.local, I can still get the address I've manually set there.

I've tried various combinations of:

  • Turning off "DNS booster"
  • Turning on "Do not route Private IPs"
  • Pressing the DNS refresh button

Footnotes

  1. The F-Droid version, if it matters.

  2. Specifically, the test build from IngoZenz/personaldnsfilter#264 (comment), so that it doesn't kill Rethink during startup.

@ignoramous ignoramous self-assigned this Aug 21, 2023
@ignoramous ignoramous added the bug Something isn't working label Aug 21, 2023
@ignoramous
Copy link
Collaborator

Yeah, we implemented mDNS but have no way to test it. And then this happens. Fully expected (:

I'd imagine, if you turn ON Do not private IPs that mDNS queries should be left out of Rethink's VPN tunnel. Surprised that's not the case.

@ignoramous
Copy link
Collaborator

any fixes we make will be a short in the dark, so unlikely this is fixed by next version or two; here's the code in case you can spot something wrong https://github.com/celzero/firestack/blob/2191d5b32f29d85058e3d513832204da36be626b/intra/dns53/mdns.go#L129 :)

@Rhys-T
Copy link
Author

Rhys-T commented Aug 21, 2023

Yeah, we implemented mDNS but have no way to test it. And then this happens. Fully expected (:

any fixes we make will be a short in the dark, so unlikely this is fixed by next version or two

Fair enough. In the meantime, would it be possible to add a setting to allow .locals to be treated the same as other domains like they were before? My understanding is that on some networks, .local is still being used as a 'normal' (made-up) domain, handled by a normal (internal) DNS server, so such a setting would still be useful there even after Rethink's mDNS implementation is fixed.

I'd imagine, if you turn ON Do not private IPs that mDNS queries should be left out of Rethink's VPN tunnel. Surprised that's not the case.

If a program explicitly makes an mDNS query itself (e.g. dig @224.0.0.251 -p5353 some-machine.local), then turning on that setting does allow it to work. However, it has no effect if I try asking a normal (port 53, unicast) DNS server, knowing that Rethink will intercept it (e.g. dig some-machine.local) - or if a program just asks 'the system' to resolve it (e.g. ssh [email protected], or for an Android-native example, telling AVNC to connect to some-machine.local).

any fixes we make will be a short in the dark, so unlikely this is fixed by next version or two; here's the code in case you can spot something wrong celzero/firestack@2191d5b/intra/dns53/mdns.go#L129 :)

I'll admit, Go isn't a language that I've gotten around to learning yet. But it seems to be roughly the same 'shape' as most imperative languages these days, so I'll see what I can make of it.


Additional notes:

The packet capture from Rethink shows the normal-DNS query coming from dig, but not the mDNS query that should be getting sent from Rethink itself. Does that indicate what the problem is? Or is it normal, since I'm just seeing packets that pass through the 'VPN' side of the Rethink app (i.e. the 10.111.222/24 network)?

Setting an app to 'bypass' allows it to resolve .local names successfully (through Android's mDNS client, I think?).

Looking in the logs, these are the only messages that mention mDNS when I try dig some-machine.local:

  • I dns: udp: suggest system-dns mdns for some-machine.local
  • I mdns: closing client {true true 0x4000314e10 0x4000314e18 <nil> <nil> map[] 0x4001313860 true 1 0x40002943c0}
  • the same two messages one second later (though with different addresses(?) on the second one), when dig switches from trying 8.8.8.8 to 8.8.4.4
  • But I don't see another pair of messages the next two times dig retries those servers.

I thought at first that it wasn't even making it to oneshotQuery - or at least not to line 92 - but then I realized that it says log.D, so it's just being logged at 'debug' level and getting discarded. Is there any way I can set the log level to include these, without having to switch to a debug build of the Rethink app?

In any case, I'm not seeing any errors, nor any sign that it's getting a response (which makes sense, given that the query is never reaching the machine that would be able to give that response).

Is there anything else you need me to look for in the logs that might be helpful?


On a slightly-related note, when Rethink starts up, I'm seeing a log entry that says I mdns(46) setup: %!s(MISSING). It looks like that log call has two %ss, but only passes one value into it. I'm not sure which parameter is missing, or what the other one was supposed to be. Doesn't seem to be indicating a real error though.

@ignoramous
Copy link
Collaborator

...would it be possible to add a setting to allow .locals to be treated the same as other domains like they were before?

Possible but it adds yet another knob in our app and yet another if condition in the code... I'd only want to add it if it helps despite Rethink getting mDNS to work (in some happy future).

If a program explicitly makes an mDNS query itself, then turning on that setting does allow it to work.

Gotcha. That's how I'd expect Do not route Private IPs to work... So, no surprises here.

I'll admit, Go isn't a language that I've gotten around to learning yet.

Hm. For me, Go was one of the easiest languages to learn. Fastest I got proficient in. It is a mix of limited versions of both JS + Python.

The packet capture from Rethink shows the normal-DNS query coming from dig, but not the mDNS query that should be getting sent from Rethink itself. Does that indicate what the problem is?

Yep. This means, the mdns transport we've wired in isn't sending out the queries at all. And this would be right as I see the mdns transport is in fact not doing anything, neither does it error out, but closes immediately within a millisecond (instead of waiting for 3 seconds). This is what appears in the logs as

I mdns: closing client {true true 0x4000314e10 0x4000314e18 <nil> <nil> map[] 0x4001313860 true 1 0x40002943c0 

Is there any way I can set the log level to include these, without having to switch to a debug build of the Rethink app?

Not yet. Good call though. We should include this in the release builds, too.

Setting an app to 'bypass' allows it to resolve .local names successfully (through Android's mDNS client, I think?).

You mean exclude? Yeah, that should let Android's built-in mDNS support kick-in for just the excluded app.

Is there anything else you need me to look for in the logs that might be helpful?

I think there's some stupid bug in our code. I already spent an hour today without making any progress. I intend to spend some more time tomorrow. Let's see.

Doesn't seem to be indicating a real error though.

Yep, that "error" doesn't matter, but I've fixed it, anyway.

@Rhys-T
Copy link
Author

Rhys-T commented Aug 22, 2023

I've also tried changing the IP version between IPv4, IPv6, and auto - that didn't seem to affect it either.

Possible but it adds yet another knob in our app and yet another if condition in the code... I'd only want to add it if it helps despite Rethink getting mDNS to work (in some happy future).

Fair enough. I suspect non-mDNS usage of .local is probably getting rarer these days.

Hm. For me, Go was one of the easiest languages to learn. Fastest I got proficient in. It is a mix of limited versions of both JS + Python.

Oh, I wasn't saying it was a difficult language - I just haven't taken the time to learn it yet.

Yep. This means, the mdns transport we've wired in isn't sending out the queries at all. And this would be right as I see the mdns transport is in fact not doing anything, neither does it error out, but closes immediately within a millisecond (instead of waiting for 3 seconds). This is what appears in the logs as

I mdns: closing client {true true 0x4000314e10 0x4000314e18 <nil> <nil> map[] 0x4001313860 true 1 0x40002943c0}

Gotcha. I didn't know whether 'upstream' DNS/mDNS queries and responses would show up in the packet capture or not.

I'm not familiar enough with the app's architecture to know what the logs should look like, but in hindsight, it makes sense that closing client shouldn't be happening immediately. Not sure what would be causing that, though.

[re: setting the log level] Not yet. Good call though. We should include this in the release builds, too.

Okay. Just wanted to check in case it would help with tracking this down.

You mean exclude? Yeah, that should let Android's built-in mDNS support kick-in for just the excluded app.

🤦 Yeah, I meant 'exclude'. Not sure what happened there - I think I was testing the 'bypass' options as well, just to be on the safe side (they didn't affect anything), then got mixed up and wrote the wrong one down. Sorry about that.

I think there's some stupid bug in our code. I already spent an hour today without making any progress. I intend to spend some more time tomorrow. Let's see.

Thank you for looking into this. Let me know if there's anything else I can do to help track this down.

@ignoramous
Copy link
Collaborator

it makes sense that closing client shouldn't be happening immediately. Not sure what would be causing that, though.

I think this was some sort of bug in the Go runtime. A nil pointer didn't cause a panic but instead the client was being mopped up... who knows.

We've fixed the nil pointer now; and expect client to NOT close right away and wait for responses. It is another thing if the client works at all: celzero/firestack@1905248

@ignoramous
Copy link
Collaborator

ignoramous commented Aug 24, 2023

Trying to test this with rclone... tailscale/tailscale#1013 (comment)

Edit: no cookie.

@ignoramous
Copy link
Collaborator

ignoramous commented Sep 6, 2023

@Rhys-T Can you see if in v055a, mDNS works for you?

Also, you can enable Debug logs from Configure -> Settings -> Log level and adb logcat | grep GoLog or adb logcat | grep -iE '(mdns|dns:)' to see what's up.

@Rhys-T
Copy link
Author

Rhys-T commented Sep 6, 2023

I'm currently running the F-Droid build of Rethink. v055a isn't available there yet, and I can't install it from another source without uninstalling, because of the certificates. I would use Rethink's backup/restore system to move over to the GitHub or Google Play version, but it sounds like there are issues with that right now (#986, #975). Is there another way that I can preserve the settings while switching to another build? Or should I just wait for F-Droid to update it and check then?

@ignoramous
Copy link
Collaborator

Please wait for F Droid to get the update. Backup and restore is a hairy beast.

@Rhys-T
Copy link
Author

Rhys-T commented Sep 9, 2023

Okay, v055a showed up on F-Droid earlier today, and I've gotten a chance to do some testing now. It's still not resolving *.local correctly, but I think it's getting closer than it was before. According to the logs, it's actually waiting about three seconds before it times out and closes the client, instead of closing it immediately like it did in v055.

Rethink's packet capture still only shows the normal-DNS query being sent by Termux/dig, and not the upstream mDNS query or response - but the target machine does actually receive the query now, and is sending back the response. It looks like Rethink just isn't seeing that response for some reason. Correction: If I'm reading the code right, the no valid answers message (which happens instantly after sending the query) actually means that Rethink is getting the response packet, but doesn't think any of the records inside are relevant to that query.

I noticed that the names in the responses were mixed-case while I had entered the query as lowercase, and thought maybe the strings.Contains() call was failing to match because it's case-sensitive. However, it still does the same thing if I query using the same capitalization that the response uses (even when that's all lowercase), so that doesn't seem to be the problem right now.

(I'm entering a specific server address in the dig command below to make sure it only sends a single query. By default it sends one query to 8.8.8.8, and one to 8.8.4.4 about a second later. Both get intercepted by Rethink, and each one fails the same way, but their log messages get interleaved, making it that much harder to tell what's going on.)

The logcat output when I run dig @8.8.8.8 some-machine.local
09-08 19:46:22.194  3482 26515 I GoLog   : V ns.dispatchers.dispatch: got(85 bytes), err(<nil>)
09-08 19:46:22.194  3482 26515 I GoLog   : V ns.dispatchers.dispatch (from-tun) proto(2048) for pkt-id(0)
09-08 19:46:22.194  3482 26515 I GoLog   : V ns.dispatchers.dispatch: resume
09-08 19:46:22.194  3482 16672 I GoLog   : V ns.e.inject-inbound(from-tun) 2048 pkt(0)
09-08 19:46:22.194  3482 16672 I GoLog   : V dns64: handle: No local nat64 to for ip(8.8.8.8)
09-08 19:46:22.194  3482 16672 I GoLog   : V udp: onFlow: no realips() or domains(), for src=10.111.222.1:52568 dst=8.8.8.8:53
09-08 19:46:22.197  3482 16672 D VpnLifecycle: process-firewall-request: ConnTrackerMetaData(uid=10324, sourceIP=10.111.222.1, sourcePort=52568, destIP=8.8.8.8, destPort=53, timestamp=1694216782197, isBlocked=false, blockedByRule=, blocklists=, protocol=17, query=, connId=aa5cad0328b29b1c), true, false
09-08 19:46:22.198  3482 16672 I GoLog   : V ns.udp.forwarder: NEW src(10.111.222.1:52568) => dst(8.8.8.8:53)
09-08 19:46:22.198  3482 16672 I GoLog   : V ns.udp.forwarder: DATA src(10.111.222.1:52568) => dst(l:10.111.222.1:52568 / r:8.8.8.8:53)
09-08 19:46:22.198  3482 16672 I GoLog   : I dns: udp: suggest dns(mdns) for some-machine.local
09-08 19:46:22.198  3482 16672 I GoLog   : V wall: no local blockerQ; letting through some-machine.local.
09-08 19:46:22.198  3482 16672 I GoLog   : V dns: udp: query NOT blocked some-machine.local; why? no rdns
09-08 19:46:22.198  3482 16672 I GoLog   : D mdns: query: some-machine.local
09-08 19:46:22.198  3482  4084 I GoLog   : V wall: no local blockerQ; letting through some-machine.local.
09-08 19:46:22.199  3482 16672 I GoLog   : D mdns: sent query4 some-machine.local.
09-08 19:46:22.199  3482 16672 I GoLog   : D mdns: waiting for answers for some-machine.local.
09-08 19:46:22.270  3482  4084 I GoLog   : D mdns: no valid answers for some-machine.local.
09-08 19:46:25.202  3482 16672 I GoLog   : W mdns: timeout for some-machine.local.
09-08 19:46:25.203  3482 16672 I GoLog   : D mdns: done: got answers 0 for some-machine.local.
09-08 19:46:25.203  3482 16672 I GoLog   : D mdns: awaiting response some-machine.local
09-08 19:46:25.203  3482 16672 I GoLog   : I mdns: no response for some-machine.local
09-08 19:46:25.203  3482 16672 I GoLog   : I mdns: closing client {true false 0x40001c4250 <nil> <nil> <nil> map[] 0x4000683da0 true 1 0x400059ca20}
09-08 19:46:27.195  3482 26515 I GoLog   : V ns.dispatchers.dispatch: got(85 bytes), err(<nil>)
09-08 19:46:27.195  3482 26515 I GoLog   : V ns.dispatchers.dispatch (from-tun) proto(2048) for pkt-id(0)
09-08 19:46:27.196  3482 26515 I GoLog   : V ns.dispatchers.dispatch: resume
09-08 19:46:27.196  3482 16672 I GoLog   : V ns.e.inject-inbound(from-tun) 2048 pkt(0)

@ignoramous
Copy link
Collaborator

thought maybe the strings.Contains() call was failing to match because it's case-sensitive.

Good catch. Even though it is unlikely to be the issue here, I've fixed it anyway: celzero/firestack@f7190cb

And added a bunch more logs(celzero/firestack@3796d92, celzero/firestack@2674ed6). I'll keep this thread updated as I work my way to running tests with mdns myself. I've bothered you enough.

@ignoramous
Copy link
Collaborator

ignoramous commented Sep 13, 2023

mDNS had a deadlocked channel and an infinite loop:

  • Basically, resch / ansch (of qcontext) must be buffered and in select statements, reading from closed channels must be accounted for (celzero/firestack@ae6899a).
  • ansch / resch (in client.query) must be read from from a go-routine different from its writer side (celzero/firestack@5941f62) that in turn reads from msgCh (in client.listen) which itself is written in to by incoming UDP socket listening for incoming mDNS responses (in client.recv).

mdns setup on deb:

# h = <hostname-without-.local-prefix>
sudo avahi-set-host-name <h>

avahi-resolve -n h.local
# h.local	172.17.1.2
# or
# h.local	fe80::172:17:1:2

# shows my local hostname being broadcast
avahi-browse -r -a -t

Unfortunately, neither Termux (excluded from Rethink or otherwise) with dig <hostname>.local @224.0.0.251 -p5353 or dig <hostname>.local, nor Rethink (0 answers for .local hit from browsers) could resolve it.

➜  serverless-dns git:(main) ✗ adb logcat | grep "mdns:"
09-13 17:35:57.453 28983  6003 I GoLog   : D mdns: oquery: <hostname>.local
09-13 17:35:57.455 28983  6003 I GoLog   : D mdns: send: sent query4 <hostname>.local.
09-13 17:35:57.455 28983  6003 I GoLog   : D mdns: send: sent query6 <hostname>.local.

09-13 17:35:57.455 28983  6003 I GoLog   : D mdns: query: waiting for ans to <hostname>
09-13 17:35:57.455 28983  6003 I GoLog   : D mdns: oquery: awaiting response <hostname>.local

09-13 17:36:00.457 28983  6003 I GoLog   : W mdns: listen: timeout for <hostname>.local.
09-13 17:36:00.457 28983  6003 I GoLog   : D mdns: listen: done; got answers 0 for <hostname>.local.

09-13 17:36:00.457 28983  6003 I GoLog   : I mdns: oquery: no response for <hostname>.local
09-13 17:36:00.457 28983  6003 I GoLog   : I mdns: closing client 

09-13 17:36:02.456 28983  1145 I GoLog   : W mdns: recv: from(<nil>); closed; bytes(0), err(read udp4 0.0.0.0:37044: i/o timeout)
09-13 17:36:02.456 28983  6003 I GoLog   : W mdns: recv: from(<nil>); closed; bytes(0), err(read udp6 [::]:36986: i/o timeout)

@Rhys-T
Copy link
Author

Rhys-T commented Sep 13, 2023

_exclude_d from Rethink or otherwise

That's odd - excluding it makes both of those commands work on my device. Does it work on yours with Rethink turned off entirely?

@ignoramous
Copy link
Collaborator

That's odd - excluding it makes both of those commands work on my device. Does it work on yours with Rethink turned off entirely?

Yeah, no difference whether rethink is ON or OFF. I think my mdns setup isn't fully working... I'll keep looking for ways to test this over the next few days, though.

ignoramous referenced this issue in celzero/firestack Sep 14, 2023
ignoramous referenced this issue in celzero/firestack Sep 18, 2023
@ignoramous
Copy link
Collaborator

Hi @Rhys-T: Can you please check if v055e (F-Droid should have the build in 3 days or so) works for .local queries? I tried to set something up on my local network and avahi on my laptop responding to xyz.local, but could never get it to work; as in, either Rethink doesn't yet impl mDNS as it should (very likely) or my setup is whack (also equally likely).

@Rhys-T
Copy link
Author

Rhys-T commented Apr 6, 2024

@ignoramous I'll keep an eye out for it. Thanks.

@Rhys-T
Copy link
Author

Rhys-T commented Apr 11, 2024

@ignoramous Sorry for the delay. Apparently I wasn't getting update notifications from F-Droid for some reason1 - I just discovered last night that the new version of Rethink (among other things) had shown up.

I just set Termux and AVNC back to the normal 'allow' settings, and I can resolve .local domains inside both of them - even after removing the manual .local entries from pDNSf, restarting pDNSf, and pressing the DNS Refresh button in Rethink. (I also tried another .local that had never been in pDNSf, just to be extra sure.) So far, it all seems to be working properly!

Thank you so much for your help with this.

Footnotes

  1. I probably just missed turning off one of the battery optimization settings or something like that, and the OS killed the F-Droid app.

@ignoramous
Copy link
Collaborator

Thank you for doing our design, research, and testing for us! We merely implemented it (erroneously so, for over 4+ versions).


mDNS aside, I doubt p2p apps would still work, unless Do not route Private IPs is enabled or those apps are Excluded from Rethink altogether.

Seems like some Android limitation (ex: #1356), but I'm yet to fully get to the bottom of it.

(closing this issue, feel free to reopen)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants