-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
rhel-8.8.0.z
-
None
-
Normal
-
sst_cs_infra_services
-
ssg_core_services
-
False
-
-
-
All
What were you trying to do that didn't work?
Executing a simple "host <fdqn>" command sometimes never returns on customer's system having 4 DNS servers in /etc/resolv.conf and the timeout option set to 1 second:
options timeout:1 search SEARCH1 SEARCH2 nameserver DNSSERVER1 nameserver DNSSERVER2 nameserver DNSSERVER3 nameserver DNSSERVER4
Collecting a coredump, we can see a task is still pending, explaining it doesn't return.
Collecting a strace (taken at a different time), we can see query is made to 127.0.0.1 and DNSSERVER1 in a row, then 1 second later to DNSSERVER2 and DNSSERVER3:
2140968 11:00:42.560872 sendmsg(6<UDP:[127.0.0.1:53774]>, {msg_name={sa_family=AF_INET, sin_port=htons(53774), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\0", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=17, cmsg_level=SOL_IP, cmsg_type=IP_TOS, cmsg_data=[0xb8]}], msg_controllen=24, msg_flags=0}, 0) = 1 <0.000058> 2140968 11:00:42.563877 sendmsg(20<UDP:[0.0.0.0:44950]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER1")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...> 2140968 11:00:43.561830 sendmsg(21<UDP:[0.0.0.0:35337]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER2")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...> 2140968 11:00:43.563750 sendmsg(22<UDP:[0.0.0.0:54657]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER3")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...>
It's unclear why 127.0.0.1 is queried since it's not in /etc/resolv.conf.
Anyway, because DNSSERVER1 doesn't answer in the 1 second delay, DNSSERVER2 and DNSSERVER3 are queried. Result gets received from the 2 latter ones, but never from DNSSERVER1.
Upon receiving some result, I would expect the command to terminate, but for some reason it's not the case.
It hence seems like to me that the cancellation code is not always working.
I don't manage to reproduce the behaviour: with 4 DNS servers in /etc/resolv.conf and the first one not responding at all, strace shows the DNS servers 3 and 4 are never queried, but I see retries on first and second ones.
Please provide the package NVR for which bug is seen:
bind-utils-9.11.36-3.el8.x86_64
How reproducible:
Often on customer system
- links to