Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-10722

"host" command doesn't always return even though some DNS server provided an answer

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • rhel-8.8.0.z
    • bind
    • None
    • Normal
    • sst_cs_infra_services
    • ssg_core_services
    • False
    • Hide

      None

      Show
      None
    • All

      What were you trying to do that didn't work?

      Executing a simple "host <fdqn>" command sometimes never returns on customer's system having 4 DNS servers in /etc/resolv.conf and the timeout option set to 1 second:

      options timeout:1
      search SEARCH1 SEARCH2
      nameserver DNSSERVER1
      nameserver DNSSERVER2
      nameserver DNSSERVER3
      nameserver DNSSERVER4

      Collecting a coredump, we can see a task is still pending, explaining it doesn't return.

      Collecting a strace (taken at a different time), we can see query is made to 127.0.0.1 and DNSSERVER1 in a row, then 1 second later to DNSSERVER2 and DNSSERVER3:

      2140968 11:00:42.560872 sendmsg(6<UDP:[127.0.0.1:53774]>, {msg_name={sa_family=AF_INET, sin_port=htons(53774), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="\0", iov_len=1}], msg_iovlen=1, msg_control=[{cmsg_len=17, cmsg_level=SOL_IP, cmsg_type=IP_TOS, cmsg_data=[0xb8]}], msg_controllen=24, msg_flags=0}, 0) = 1 <0.000058>
      2140968 11:00:42.563877 sendmsg(20<UDP:[0.0.0.0:44950]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER1")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...>
      
      2140968 11:00:43.561830 sendmsg(21<UDP:[0.0.0.0:35337]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER2")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...>
      2140968 11:00:43.563750 sendmsg(22<UDP:[0.0.0.0:54657]>, {msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("DNSSERVER3")}, msg_namelen=16, msg_iov=[{iov_base="\354\277\1\0\0\1\0\0\0\0\0\0REDACTED_FQDN\0\0\1\0\1", iov_len=42}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0 <unfinished ...>
      

      It's unclear why 127.0.0.1 is queried since it's not in /etc/resolv.conf.

      Anyway, because DNSSERVER1 doesn't answer in the 1 second delay, DNSSERVER2 and DNSSERVER3 are queried. Result gets received from the 2 latter ones, but never from DNSSERVER1.

      Upon receiving some result, I would expect the command to terminate, but for some reason it's not the case.

      It hence seems like to me that the cancellation code is not always working.

       

      I don't manage to reproduce the behaviour: with 4 DNS servers in /etc/resolv.conf and the first one not responding at all, strace shows the DNS servers 3 and 4 are never queried, but I see retries on first and second ones.

      Please provide the package NVR for which bug is seen:

      bind-utils-9.11.36-3.el8.x86_64

      How reproducible:

      Often on customer system

            pemensik@redhat.com Petr Mensik
            rhn-support-rmetrich Renaud Metrich
            Petr Mensik Petr Mensik
            rhel-cs-infra-services-qe rhel-cs-infra-services-qe rhel-cs-infra-services-qe rhel-cs-infra-services-qe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: