Uploaded image for project: 'AMQ Interconnect'
  1. AMQ Interconnect
  2. ENTMQIC-2023

qdrouterd accumulates memory on connections with link attach+detach loop

    Details

    • Type: Bug
    • Status: Done
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 1.0.0.GA
    • Fix Version/s: 1.2.0.CR4
    • Component/s: Qpid Dispatch Router
    • Labels:
      None
    • Environment:

      qpid-proton-c-0.18.1-2.el7.x86_64
      qpid-dispatch-router-1.0.0-6.el7.x86_64

    • Target Release:
    • Sprint:
      Interconnect - June Sprint
    • Steps to Reproduce:
      Hide

      Particular reproducer: python script (non-threaded version):

      import random
      from proton.utils import BlockingConnection
      from time import sleep
      from uuid import uuid4
      
      ROUTER_ADDRESS = "proton+amqp://0.0.0.0:5672"
      ADDRESS = "test.address"
      HEARTBEAT = 5
      SLEEP_MIN = 0.1
      SLEEP_MAX = 0.2
      
      conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=None, heartbeat=HEARTBEAT)
      
      while True:
        recv = conn.create_receiver('%s' %(ADDRESS), name=str(uuid4()), dynamic=False, options=None)
        sleep(random.uniform(SLEEP_MIN,SLEEP_MAX))
        recv.close()
        sleep(random.uniform(SLEEP_MIN,SLEEP_MAX))
      

      threaded version for faster reproducer:

      import random
      import threading
      from proton.utils import BlockingConnection
      from time import sleep
      from uuid import uuid4
      
      ROUTER_ADDRESS = "proton+amqp://0.0.0.0:5672"
      ADDRESS = "test.address"
      HEARTBEAT = 5
      SLEEP_MIN = 0.1
      SLEEP_MAX = 0.2
      THREADS = 40
      ADDRESSES = 20
      
      class ReceiverThread(threading.Thread):
          def __init__(self, address=ADDRESS):
              super(ReceiverThread, self).__init__()
              self.address = address
      
          def run(self):
      	self.conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=None, heartbeat=HEARTBEAT)
              while True:
                  self.recv = self.conn.create_receiver('%s' %(self.address), name=str(uuid4()), dynamic=False, options=None)
      	    sleep(random.uniform(SLEEP_MIN,SLEEP_MAX))
      	    self.recv.close()
      	    sleep(random.uniform(SLEEP_MIN,SLEEP_MAX))
      
      threads = []
      for i in range(THREADS):
        threads.append(ReceiverThread('%s.%s' %(ADDRESS, i%ADDRESSES)))
        threads[i].start()
      
      for i in range(THREADS):
          threads[i].join()      # technically, we will never pass this line
      

      Run that script and observe memory utilization. Another diagnostic step requires gdb - before killing the reproducer script, run:

      gdb -p $(pgrep qdrouterd) $(which qdrouterd)
      

      add breakpoint to pn_list_remove. Then kill the reproducer and breakpoint will show backtrace:

      #0  pn_list_remove (list=0x244a1b0, value=value@entry=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/object/list.c:97
      #1  0x00007fd029595615 in pni_remove_link (ssn=0x2449b20, link=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:310
      #2  0x00007fd029597b38 in pn_link_free (link=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:351
      #3  0x00007fd029597bc3 in pn_session_free (session=0x2449b20) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:270
      #4  0x00007fd029597c7c in pn_connection_release (connection=connection@entry=0x7fd00c02fc30) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:124
      #5  0x00007fd029597cd9 in pn_connection_free (connection=0x7fd00c02fc30) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:145
      #6  0x00007fd029594cc6 in pn_connection_driver_destroy (d=d@entry=0x7fd00cf96608) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/connection_driver.c:93
      #7  0x00007fd0293746cc in pconnection_final_free (pc=pc@entry=0x7fd00cf96060) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:827
      #8  0x00007fd029375544 in pconnection_cleanup (pc=pc@entry=0x7fd00cf96060) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:843
      #9  0x00007fd029376532 in pconnection_process (pc=0x7fd00cf96060, events=<optimized out>, events@entry=0, timeout=timeout@entry=true, topup=topup@entry=false)
          at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:1178
      #10 0x00007fd029376f1b in proactor_do_epoll (p=0x2263eb0, can_block=can_block@entry=true) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:2013
      #11 0x00007fd029377e9a in pn_proactor_wait (p=<optimized out>) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:2028
      #12 0x00007fd0298065a9 in thread_run (arg=0x226e260) at /usr/src/debug/qpid-dispatch-1.0.0/src/server.c:932
      #13 0x00007fd02915ce25 in start_thread (arg=0x7fd016ffd700) at pthread_create.c:308
      #14 0x00007fd02849034d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
      

      where "p list->size" shows arbitrary high numer (despite there is up to 1 link on each connection only).

      Show
      Particular reproducer: python script (non-threaded version): import random from proton.utils import BlockingConnection from time import sleep from uuid import uuid4 ROUTER_ADDRESS = "proton+amqp://0.0.0.0:5672" ADDRESS = "test.address" HEARTBEAT = 5 SLEEP_MIN = 0.1 SLEEP_MAX = 0.2 conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=None, heartbeat=HEARTBEAT) while True: recv = conn.create_receiver('%s' %(ADDRESS), name=str(uuid4()), dynamic=False, options=None) sleep(random.uniform(SLEEP_MIN,SLEEP_MAX)) recv.close() sleep(random.uniform(SLEEP_MIN,SLEEP_MAX)) threaded version for faster reproducer: import random import threading from proton.utils import BlockingConnection from time import sleep from uuid import uuid4 ROUTER_ADDRESS = "proton+amqp://0.0.0.0:5672" ADDRESS = "test.address" HEARTBEAT = 5 SLEEP_MIN = 0.1 SLEEP_MAX = 0.2 THREADS = 40 ADDRESSES = 20 class ReceiverThread(threading.Thread): def __init__(self, address=ADDRESS): super(ReceiverThread, self).__init__() self.address = address def run(self): self.conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=None, heartbeat=HEARTBEAT) while True: self.recv = self.conn.create_receiver('%s' %(self.address), name=str(uuid4()), dynamic=False, options=None) sleep(random.uniform(SLEEP_MIN,SLEEP_MAX)) self.recv.close() sleep(random.uniform(SLEEP_MIN,SLEEP_MAX)) threads = [] for i in range(THREADS): threads.append(ReceiverThread('%s.%s' %(ADDRESS, i%ADDRESSES))) threads[i].start() for i in range(THREADS): threads[i].join() # technically, we will never pass this line Run that script and observe memory utilization. Another diagnostic step requires gdb - before killing the reproducer script, run: gdb -p $(pgrep qdrouterd) $(which qdrouterd) add breakpoint to pn_list_remove. Then kill the reproducer and breakpoint will show backtrace: #0 pn_list_remove (list=0x244a1b0, value=value@entry=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/object/list.c:97 #1 0x00007fd029595615 in pni_remove_link (ssn=0x2449b20, link=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:310 #2 0x00007fd029597b38 in pn_link_free (link=0x23b19a0) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:351 #3 0x00007fd029597bc3 in pn_session_free (session=0x2449b20) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:270 #4 0x00007fd029597c7c in pn_connection_release (connection=connection@entry=0x7fd00c02fc30) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:124 #5 0x00007fd029597cd9 in pn_connection_free (connection=0x7fd00c02fc30) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/engine.c:145 #6 0x00007fd029594cc6 in pn_connection_driver_destroy (d=d@entry=0x7fd00cf96608) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/core/connection_driver.c:93 #7 0x00007fd0293746cc in pconnection_final_free (pc=pc@entry=0x7fd00cf96060) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:827 #8 0x00007fd029375544 in pconnection_cleanup (pc=pc@entry=0x7fd00cf96060) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:843 #9 0x00007fd029376532 in pconnection_process (pc=0x7fd00cf96060, events=<optimized out>, events@entry=0, timeout=timeout@entry=true, topup=topup@entry=false) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:1178 #10 0x00007fd029376f1b in proactor_do_epoll (p=0x2263eb0, can_block=can_block@entry=true) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:2013 #11 0x00007fd029377e9a in pn_proactor_wait (p=<optimized out>) at /usr/src/debug/qpid-proton-0.22.0/proton-c/src/proactor/epoll.c:2028 #12 0x00007fd0298065a9 in thread_run (arg=0x226e260) at /usr/src/debug/qpid-dispatch-1.0.0/src/server.c:932 #13 0x00007fd02915ce25 in start_thread (arg=0x7fd016ffd700) at pthread_create.c:308 #14 0x00007fd02849034d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 where "p list->size" shows arbitrary high numer (despite there is up to 1 link on each connection only).
    • Affects:
      Release Notes
    • Release Notes Text:
      Hide
      In previous versions of AMQ Interconnect, a defect caused memory usage to build up when a long-lived connection carried many link attaches and detaches. The memory for the links was not freed until the connection was closed. This defect has been corrected, and link memory is reclaimed when the link is closed.
      Show
      In previous versions of AMQ Interconnect, a defect caused memory usage to build up when a long-lived connection carried many link attaches and detaches. The memory for the links was not freed until the connection was closed. This defect has been corrected, and link memory is reclaimed when the link is closed.
    • Release Notes Docs Status:
      Documented as Resolved Issue

      Description

      When having a client connection attaching and detaching a link, qdrouterd accumulates memory until the AMQP connection is closed. This is fatal for long-standing / permanent client connections that e.g. (time to time or frequently) create a receiver and close it when not further required.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  aconway Alan Conway
                  Reporter:
                  pmoravec Pavel Moravec
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: