Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2186

Coordinator tries to install new view after graceful shutdown

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor Minor
    • None
    • 5.1.6.FINAL
    • None

      This is not a serious problem, because so far the only thing I discoverd it causes is a superfluous debug level log message.

      This happened in elasticity tests with radargun,
      In these tests we go from 4 nodes to 8 and back to 4.

      See views installed:
      http://www.qa.jboss.com/~mlinhard/hyperion2/run218-radargun-08-elasticity-JDG.6.0.1.ER1/report/loganalysis/views.html

      In this case, each time the node is shutdown it is the coordinator (I'm not sure whether this is accident or coordinators are picked by their seniority in the cluster)

      The shutdown is made gracefully via DefaultCacheManager.stop() and each time this happens I can see an attempt of CacheViewsManagerImpl to install a new view - which doesn't complete because the coordinator shuts down few moments later and the new view is really established by a new coordinator.

      Log from slave on node0003:

      02:37:32,737 INFO  [org.radargun.stages.KillStage] (pool-1-thread-1) Tearing down cache wrapper.
      02:38:02,752 WARN  [org.infinispan.transaction.TransactionTable] (pool-1-thread-1) ISPN000100: Stopping, but there are 9 local transactions and 0 remote transactions that did not finish in time.
      02:38:02,755 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-3,hyperion1098-55173) Installing new view CacheView{viewId=9, members=[hyperion1099-41789, hyperion1097-42149, hyperion1100-42888, hyperion1102-1099, hyperion1101-56019, hyperion1103-38655, hyperion1096-46484]} for cache x
      02:38:04,279 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) ISPN000080: Disconnecting and closing JGroups Channel
      02:38:04,641 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-1-thread-1) ISPN000082: Stopping the RpcDispatcher
      02:38:04,816 INFO  [org.radargun.Slave] (pool-1-thread-1) Finished stage: KillStage {tearDown=true, productName='jdg60', useSmartClassLoading=true, slaveIndex=0, activeSlavesCount=8, totalSlavesCount=8, slaves=[0]}
      02:38:04,817 INFO  [org.radargun.Slave] (main) Ack successfully sent to the master
      

            [ISPN-2186] Coordinator tries to install new view after graceful shutdown

            Radim Vansa <rvansa@redhat.com> changed the Status of bug 854665 from ON_QA to VERIFIED

            RH Bugzilla Integration added a comment - Radim Vansa <rvansa@redhat.com> changed the Status of bug 854665 from ON_QA to VERIFIED

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 854665 from MODIFIED to ON_QA

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 854665 from MODIFIED to ON_QA

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 854665 from ASSIGNED to MODIFIED

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 854665 from ASSIGNED to MODIFIED

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 854665

            Set flag to nominate this bug for 6.2 release notes.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 854665 Set flag to nominate this bug for 6.2 release notes.

            mark yarborough <myarboro@redhat.com> made a comment on bug 854665

            Not a regression and does not affect data integrity. Setting severity to LOW and moving to 6.1.

            RH Bugzilla Integration added a comment - mark yarborough <myarboro@redhat.com> made a comment on bug 854665 Not a regression and does not affect data integrity. Setting severity to LOW and moving to 6.1.

            A proper fix would require the global components to know that they will be stopped before we start shutting down the caches. But I don't think it's worth doing it just to avoid a DEBUG log message.

            We should instead focus on adding a method to gracefully shut down the entire cluster: ISPN-1239

            Dan Berindei (Inactive) added a comment - A proper fix would require the global components to know that they will be stopped before we start shutting down the caches. But I don't think it's worth doing it just to avoid a DEBUG log message. We should instead focus on adding a method to gracefully shut down the entire cluster: ISPN-1239

            I see, I misread the description of the bug a bit.

            I fixed the part where the cache view installation commands from the old coordinator reach the new coordinator and break the new coordinator's cache view installation (potentially making it hang). I did not fix the old coordinator attempting to install a new cache view, because CacheViewsManagerImpl only finds out about the shutdown after all the local caches have been already stopped.

            Dan Berindei (Inactive) added a comment - I see, I misread the description of the bug a bit. I fixed the part where the cache view installation commands from the old coordinator reach the new coordinator and break the new coordinator's cache view installation (potentially making it hang). I did not fix the old coordinator attempting to install a new cache view, because CacheViewsManagerImpl only finds out about the shutdown after all the local caches have been already stopped.

            Michal Abaffy <mabaffy@redhat.com> made a comment on bug 854665

            I have run this test also on jenkins machines, so you could see whole logs and configuration of test.

            See Build Artifacts->report->stdout.zip->slave2.log and time 05:09:35 in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/jdg60-benchmark-elasticity-02-04-radargun/2/

            That warning is also in slave4.log and is missing in slave1 and slave3

            RH Bugzilla Integration added a comment - Michal Abaffy <mabaffy@redhat.com> made a comment on bug 854665 I have run this test also on jenkins machines, so you could see whole logs and configuration of test. See Build Artifacts->report->stdout.zip->slave2.log and time 05:09:35 in https://jenkins.mw.lab.eng.bos.redhat.com/hudson/view/EDG6/view/EDG-REPORTS-PERF/job/jdg60-benchmark-elasticity-02-04-radargun/2/ That warning is also in slave4.log and is missing in slave1 and slave3

            Martin Gencur <mgencur@redhat.com> changed the Status of bug 854665 from ON_QA to ASSIGNED

            RH Bugzilla Integration added a comment - Martin Gencur <mgencur@redhat.com> changed the Status of bug 854665 from ON_QA to ASSIGNED

            I have run benchmark elasticity 2-4 test with radargun on my localhost with infinispan-core:5.1.7-Final-redhat-1 several times and the problem seems to still be there.
            Example log:
            11:25:13,355 WARN [org.infinispan.transaction.TransactionTable] (pool-1-thread-1) ISPN000100: Stopping, but there are 5 local transactions and 0 remote transactions that did not finish in time.
            11:25:13,363 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,mabaffy-14775) Installing new view CacheView

            {viewId=8, members=[mabaffy-5803, mabaffy-30789]} for cache x
            11:25:13,371 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,mabaffy-14775) Cache x view CacheView{viewId=8, members=[mabaffy-5803, mabaffy-30789]}

            installation was interrupted because the coordinator is shutting down

            Michal Abaffy (Inactive) added a comment - I have run benchmark elasticity 2-4 test with radargun on my localhost with infinispan-core:5.1.7-Final-redhat-1 several times and the problem seems to still be there. Example log: 11:25:13,355 WARN [org.infinispan.transaction.TransactionTable] (pool-1-thread-1) ISPN000100: Stopping, but there are 5 local transactions and 0 remote transactions that did not finish in time. 11:25:13,363 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,mabaffy-14775) Installing new view CacheView {viewId=8, members=[mabaffy-5803, mabaffy-30789]} for cache x 11:25:13,371 DEBUG [org.infinispan.cacheviews.CacheViewsManagerImpl] (CacheViewInstaller-1,mabaffy-14775) Cache x view CacheView{viewId=8, members=[mabaffy-5803, mabaffy-30789]} installation was interrupted because the coordinator is shutting down

              dberinde@redhat.com Dan Berindei (Inactive)
              mlinhard Michal Linhard (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: