Age | Commit message (Collapse) | Author | Files | Lines |
|
When IPv6 nameservers are present, __res_msend_rc attempts to disable
IPV6_V6ONLY socket option to ensure that it can communicate with IPv4
nameservers (if they are present too) via IPv4-mapped IPv6 addresses.
However, this option can't be disabled on bound sockets, so setsockopt
always fails.
|
|
A zero returned from recvmsg is currently treated as if some data were
received, so if a DNS server closes its TCP socket before sending the
full answer, __res_msend_rc will spin until the timeout elapses because
POLLIN event will be reported on each poll. Fix this by treating an
early EOF as an error.
|
|
Before this commit, DNS timeouts always used CLOCK_REALTIME, which
could produce spurious timeouts or delays if wall time changed for
whatever reason.
Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when
it is unavailable.
|
|
we already attempt to preclude this case by having res_send use a
sufficiently large temporary buffer even if the caller did not provide
one as large as or larger than the udp dns max of 512 bytes. however,
it's possible that the caller passed a custom-crafted query packet
using EDNS0, e.g. to get detailed DNSSEC results, with a larger udp
size allowance.
I have also seen claims that there are some broken nameservers in the
wild that do not honor the dns udp limit of 512 and send large answers
without the TC bit set, when the query was not using EDNS.
we generally don't aim to support broken nameservers, but in this case
both problems, if the latter is even real, have a common solution:
using recvmsg instead of recvfrom so we can examine the MSG_TRUNC
flag.
|
|
tcp fallback was originally deemed unwanted and unnecessary, since we
aim to return a bounded-size result from getaddrinfo anyway and
normally plenty of address records fit in the 512-byte udp dns limit.
however, this turned out to have several problems:
- some recursive nameservers truncate by omitting all the answers,
rather than sending as many as can fit.
- a pathological worst-case CNAME for a worst-case name can fill the
entire 512-byte space with just the two names, leaving no room for
any addresses.
- the res_* family of interfaces allow querying of non-address records
such as TLSA (DANE), TXT, etc. which can be very large. for many of
these, it's critical that the caller see the whole RRset. also,
res_send/res_query are specified to return the complete, untruncated
length so that the caller can retry with an appropriately-sized
buffer. determining this is not possible without tcp.
so, it's time to add tcp fallback.
the fallback strategy implemented here uses one tcp socket per
question (1 or 2 questions), initiated via tcp fastopen when possible.
the connection is made to the nameserver that issued the truncated
answer. right now, fallback happens unconditionally when truncation is
seen. this can, and may later be, relaxed for queries made by the
getaddrinfo system, since it will only use a bounded number of results
anyway.
retry is not attempted again after failure over tcp. the logic could
easily be adapted to do that, but it's of questionable value, since
the tcp stack automatically handles retransmission and the successs
answer with TC=1 over udp strongly suggests that the nameserver has
the full answer ready to give. further retry is likely just "take
longer to fail".
|
|
this is groundwork for TCP fallback support, but does not itself
change behavior in any way.
|
|
if resolv.conf lists no nameservers at all, the default of 127.0.0.1
is used. however, another "no nameservers" case arises where the
system has ipv6 support disabled/configured-out and resolv.conf only
contains v6 nameservers. this caused the resolver to repeat socket
operations that will necessarily fail (sending to one or more
wrong-family addresses) while waiting for a timeout.
it would be contrary to configured intent to query 127.0.0.1 in this
case, but the current behavior is not conducive to diagnosing the
configuration problem. instead, fail immediately with EAI_SYSTEM and
errno==EAFNOSUPPORT so that the configuration error is reportable.
|
|
apparently this code path was never tested, as it's not usual to have
v6 nameservers listed on a system without v6 networking support. but
it was always intended to work.
when reverting to binding a v4 address, also revert the family in the
sockaddr structure and the socklen for it. otherwise bind will just
fail due to mismatched family/sockaddr size.
fix dns resolver fallback when v6 nameservers are listed by
|
|
|
|
The variable nss is set to zero in following line.
|
|
|
|
this change is made in preparation for adding search domains, for
which higher-level code will need to parse resolv.conf. simply parsing
it twice for each lookup would be one reasonable option, but the
existing parser code was buggy anyway, which suggested to me that it's
a bad idea to have two variants of this code in two different places.
the old code in res_msend potentially misinterpreted overly long lines
in resolv.conf, and stopped parsing after it found 3 nameservers, even
if there were relevant options left to be parsed later in the file.
|
|
previously, transient failures like fd exhaustion or other
resource-related errors were treated the same as non-existence of
these files, leading to fallbacks or false-negative results. in
particular:
- failure to open hosts resulted in fallback to dns, possibly yielding
EAI_NONAME for a hostname that should be defined locally, or an
unwanted result from dns that the hosts file was intended to
replace.
- failure to open services resulted in EAI_SERVICE.
- failure to open resolv.conf resulted in querying localhost rather
than the configured nameservers.
now, only permanent errors trigger the fallback behaviors above; all
other errors are reportable to the caller as EAI_SYSTEM.
|
|
the results of a dns query, whether it's performed as part of one of
the standard name-resolving functions or directly by res_send, should
be a function of the query, not of the particular nameserver that
responds to it. thus, all responses which indicate a failure or
refusal by the nameserver, as opposed to a positive or negative result
for the query, should be ignored.
the strategy used is to re-issue the query immediately (but with a
limit on the number of retries, in case the server is really broken)
when a response code of 2 (server failure, typically transient) is
seen, and otherwise take no action on bad responses (which generally
indicate a misconfigured nameserver or one which the client does not
have permission to use), allowing the normal retry interval to apply
and of course accepting responses from other nameservers queried in
parallel.
empirically this matches the traditional resolver behavior for
nameservers that respond with a code of 2 in the case where there is
just a single nameserver configured. the behavior diverges when
multiple nameservers are available, since musl is querying them in
parallel. in this case we are mildly more aggressive at retrying.
|
|
this also affects the legacy gethostbyaddr family, which uses
getnameinfo as its backend.
some other minor changes associated with the refactoring of source
files are also made; in particular, the resolv.conf parser now uses
the same code that's used elsewhere to handle ip literals, so as a
side effect it can now accept a scope id for nameserver addressed with
link-local scope.
|
|
|
|
this is the second phase of the "resolver overhaul" project.
the key additions in this commit are the __res_msend and __res_mkquery
functions, which have been factored so as to provide a backend for
both the legacy res_* functions and the standard getaddrinfo and
getnameinfo functions. the latter however are still using the old
backend code; there is code duplication which still needs to be
removed, and this will be the next phase of the resolver overhaul.
__res_msend is derived from the old __dns_doqueries function, but
generalized to send arbitrary caller-provided packets in parallel
rather than producing the parallel queries itself. this allows it to
be used (completely trivially) as a backend for res_send. the
factored-out query generation code, with slightly more generality, is
now part of __res_mkquery.
|