Uploaded image for project: 'mod_cluster'
  1. mod_cluster
  2. MODCLUSTER-487

Default AdvertiseBindAddress value should not be NULL (UDP Multicast on Linux systems with more NICs)

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.2.11.Final, 1.3.1.Final
    • Fix Version/s: 2.0.0.Alpha1
    • Component/s: Native (httpd modules)
    • Labels:
      None
    • Environment:

      Linux, multiple NICs environment

      Description

      Credit where it's due: the issue was first spotted by Radim Hatlapatka.

      Problem

      It appears that trying to send to all interfaces with NULL or "0.0.0.0" – the default bindaddr when no AdvertiseBindAddress is set – in the following statement actually picks the first non-loopback interface and sends to it.

          if ((rv = apr_sockaddr_info_get(&ma_listen_sa, bindaddr,
                                          ma_mgroup_sa->family, bindport,
                                          APR_UNSPEC, pool)) != APR_SUCCESS) {
              ap_log_error(APLOG_MARK, APLOG_ERR, rv, s,
                           "mod_advertise: ma_group_join apr_sockaddr_info_get(%s:%d) failed", bindaddr, bindport);
      

      The result is that there is no datagram on other interfaces. Surprisingly, this is not deterministic though: After dozens or hundreds of messages, eventually one datagram reaches another interface.

      Impact

      Picture this simple scenario: There are two interfaces, e.g.

      enp1s0 10.16.88.187
      enp2s0 172.18.0.1
      

      listed in this exact order with ip addr show.

      One has an EAP 7 (Wildfly 10) instance with mod_cluster bound to 172.18.0.1 IP address, which implies enp2s0 interface.

      Furthermore, one has an Apache HTTP Server instance with mod_cluster bound to 172.18.0.1 IP address, i.e. MCMP VirtualHost and main VirtualHost all Listen on this IP address.

      Result: Without advertising, using an explicit proxy-list, all is well. MCMP works, requests work, balancing works.
      On the other hand, relying on advertisement, it could take EAP 7 (Wildfly 10) minutes to register with the balancer.
      The reason is that a vast majority of UDP Multicast datagrams arrives at enp1s0 and EAP 7 (Wildfly 10) doesn't see them.

      Reproducer

      Lemme demonstrate with a recently refactored advertise.c utility for sending datagrams and the well known Advertize.java utility for receiving them.
      Your your convenience, here are binaries built from the aforementioned sources:

      Demonstration on Linux

      System

      [mbabacek@perf09 ~]$ ip addr show
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
      2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
          link/ether 00:18:8b:7a:46:04 brd ff:ff:ff:ff:ff:ff
          inet 10.16.88.187/21 brd 10.16.95.255 scope global enp1s0
             valid_lft forever preferred_lft forever
          inet 10.16.93.253/21 brd 10.16.95.255 scope global secondary enp1s0
             valid_lft forever preferred_lft forever
      3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
          link/ether 00:18:8b:7a:46:05 brd ff:ff:ff:ff:ff:ff
          inet 172.17.72.254/19 brd 172.17.95.255 scope global enp2s0
             valid_lft forever preferred_lft forever
      4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
          link/ether 02:42:07:ab:74:f9 brd ff:ff:ff:ff:ff:ff
          inet 172.18.0.1/16 scope global docker0
             valid_lft forever preferred_lft forever
      

      Java

      [mbabacek@perf09 ~]$ java -version
      openjdk version "1.8.0_71"
      OpenJDK Runtime Environment (build 1.8.0_71-b15)
      OpenJDK 64-Bit Server VM (build 25.71-b15, mixed mode)
      

      Advertise SENT

      [mbabacek@perf09 ~]$ date;./advertise -a 224.0.1.102 -p 33364
      Mon Mar 21 12:39:51 EDT 2016
      UDP Multicast address to send datagrams to. Value: 224.0.1.102
      UDP Multicast port. Value: 33364
      IP address of the NIC to bound to. Value: NULL
      apr_socket_bind on 0.0.0.0:0
      apr_mcast_join on 0.0.0.0:0
      apr_socket_sendto to 224.0.1.102:33364
      

      Advertize RECEIVED

      YES

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364
      Linux like OS
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
      received from /10.16.88.187:38907
      

      YES

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 10.16.88.187
      Linux like OS
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
      received from /10.16.88.187:38907
      

      NO

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 172.17.72.254
      Linux like OS
      ready waiting...
      

      YES

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 0.0.0.0
      Linux like OS
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 16:39:51 GMT
      received from /10.16.88.187:38907
      

      And now let's take a look at 172.17.72.254, i.e. enp2s0

      Advertise SENT

      [mbabacek@perf09 ~]$ date;./advertise -a 224.0.1.102 -p 33364 -n 172.17.72.254
      Mon Mar 21 12:42:57 EDT 2016
      UDP Multicast address to send datagrams to. Value: 224.0.1.102
      UDP Multicast port. Value: 33364
      IP address of the NIC to bound to. Value: 172.17.72.254
      apr_socket_bind on 172.17.72.254:0
      apr_mcast_join on 172.17.72.254:0
      apr_socket_sendto to 224.0.1.102:33364
      

      Advertize RECEIVED

      NO

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364
      Linux like OS
      ready waiting...
      

      NO

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 10.16.88.187
      Linux like OS
      ready waiting...
      

      YES

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 172.17.72.254
      Linux like OS
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 16:42:57 GMT
      received from /172.17.72.254:35452
      

      NO

      [mbabacek@perf09 ~]$ java Advertize 224.0.1.102 33364 0.0.0.0
      Linux like OS
      ready waiting...
      

      Demonstration on Windows

      One could note that the problem doesn't exist on Windows. All interfaces receive advertising.

      Advertise SENT

      C:\Users\karm\advertise-build
      λ advertise.exe -a 224.0.1.102 -p 33364
      UDP Multicast address to send datagrams to. Value: 224.0.1.102
      UDP Multicast port. Value: 33364
      IP address of the NIC to bound to. Value: NULL
      apr_socket_bind on 0.0.0.0:0
      apr_mcast_join on 0.0.0.0:0
      apr_socket_sendto to 224.0.1.102:33364
      

      Advertize RECEIVED

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
      received from /192.168.122.52:61805
      

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364 192.168.122.52
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
      received from /192.168.122.52:61805
      

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364 192.168.122.199
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:07:50 GMT
      received from /192.168.122.52:61805
      

      Advertise SENT

      C:\Users\karm\advertise-build
      λ advertise.exe -a 224.0.1.102 -p 33364 -n 192.168.122.199
      UDP Multicast address to send datagrams to. Value: 224.0.1.102
      UDP Multicast port. Value: 33364
      IP address of the NIC to bound to. Value: 192.168.122.199
      apr_socket_bind on 192.168.122.199:0
      apr_mcast_join on 192.168.122.199:0
      apr_socket_sendto to 224.0.1.102:33364
      

      Advertize RECEIVED

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
      received from /192.168.122.199:52781
      

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364 192.168.122.52
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
      received from /192.168.122.199:52781
      

      YES

      C:\Users\karm\WORKSPACE
      λ "C:\Program Files\Java\jdk1.8.0_74\bin\java" Advertize 224.0.1.102 33364 192.168.122.199
      ready waiting...
      received: Advertize !!! Mon, 21 Mar 2016 18:09:55 GMT
      received from /192.168.122.199:52781
      

      Suggestion

      Ideas? Jean-Frederic Clere, Radoslav Husar
      I suggest setting bindaddr (AdvertiseBindAddress) default to main_server's address or MCMP enabled vhost instead of NULL. I'll post a PR for evaluation.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  mbabacek Michal Karm Babacek
                  Reporter:
                  mbabacek Michal Karm Babacek
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated: