Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4343

Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

      A try to mimic the process of REST Rolling Upgrades for one old and new server in a clustered environment failed.

      Scenario is quite simple, we start 2 old servers, store some data in, start 2 new servers and point clients to that new cluster.
      When issuing a get on a new cluster (want to fetch old entry from old store), the operation fails with attached stack trace.

      I also include current ISPN testsuite where is added testRestRollingUpgradesDiffVersionsDist test as a reproducer.

      Respective changes are mirrored in my remote branch: https://github.com/tsykora/infinispan/tree/ISPN-4330

      You can run test like:

      mvn clean verify -P suite.rolling.upgrades -Dzip.dist.old=/home/you/servers/previous-ispn-server-version.zip -Dtest=RestRollingUpgradesTest#testRestRollingUpgradesDiffVersionsDist

        1. cannot_be_cast.txt
          5 kB
        2. clustered.xml
          15 kB
        3. clustered-rest-rolling-upgrade.xml
          15 kB
        4. ISPN-4343.txt
          79 kB
        5. ISPN-4343.zip
          296 kB
        6. restRollUpsTraceLog.zip
          459 kB

            [ISPN-4343] Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly

            Tomas Sykora <tsykora@redhat.com> changed the Status of bug 1104659 from NEW to CLOSED

            RH Bugzilla Integration added a comment - Tomas Sykora <tsykora@redhat.com> changed the Status of bug 1104659 from NEW to CLOSED

            Update:

            I have 2 important information:

            1) This use case is working for me now as well.
            2) I very like pruivo@redhat.com who helped me a lot with this copy-paste-damn-that-easily-overlooked-thing issue!

            Closing, not a bug
            Thanks!

            Tomas Sykora added a comment - Update: I have 2 important information: 1) This use case is working for me now as well. 2) I very like pruivo@redhat.com who helped me a lot with this copy-paste-damn-that-easily-overlooked-thing issue! Closing, not a bug Thanks!

            Pedro Ruivo added a comment -

            IMO, I think the codec needs to change. I didn't try it yet, but my idea is to change the codec to put the data as key and put the data type in the MimeMetadata. then we can get rid of MimeCacheEntry.

            Pedro Ruivo added a comment - IMO, I think the codec needs to change. I didn't try it yet, but my idea is to change the codec to put the data as key and put the data type in the MimeMetadata. then we can get rid of MimeCacheEntry.

            I will fix it and issue a PR, assigning to myself.

            Thank you for your help and input Pedro!

            Tomas Sykora added a comment - I will fix it and issue a PR, assigning to myself. Thank you for your help and input Pedro!

            Pedro Ruivo added a comment -

            ah, I forgot something... I had to disconnect the Rest store from both nodes in the new cluster in your test.

            Pedro Ruivo added a comment - ah, I forgot something... I had to disconnect the Rest store from both nodes in the new cluster in your test.

            Ah! I overlooked that... thank you pruivo@redhat.com very much!

            Cool, a bug in CLI is definitely less critical than a bug in rest rolling upgrades itself.

            Tomas Sykora added a comment - Ah! I overlooked that... thank you pruivo@redhat.com very much! Cool, a bug in CLI is definitely less critical than a bug in rest rolling upgrades itself.

            Pedro Ruivo added a comment -

            Update:

            The test provided by tsykora@redhat.com is passing. There is a bug in the test where the RestStore is pointing to the HotRod port instead of Rest port.

            However, when I try with the CLI, we still have the same problem. I believe that is an issue in CLI RestCodec.

            Pedro Ruivo added a comment - Update: The test provided by tsykora@redhat.com is passing. There is a bug in the test where the RestStore is pointing to the HotRod port instead of Rest port. However, when I try with the CLI, we still have the same problem. I believe that is an issue in CLI RestCodec.

            mgencur NadirX you might be interested:

            I left our testsuite and tried manual set up – 2 old servers and 2 new servers, clustered:

            1) put 5 entries via CLI into old1 [put --codec=rest default.key1 value1]

            2) verify entries are replicated to old2 and accessible (ok) [get --codec=rest default.key1] – returned value1 – OK!

            3) try to remotely get key1 issuing get on new1 [get --codec=rest default.key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry"
            – see attached cannot_be_cast.txt for full log

            4) new1 fetched this one entry from REST remote cache store (old1) – OK (but can't read it properly)

            5) issue synchronize-data operation on new1 using jconsole – returned 5 - this is OK, server replied:
            10:19:11,716 INFO org.infinispan.upgrade.RollingUpgradeManager (RMI TCP Connection(5)-127.0.0.1) ISPN000216: 5 entries migrated to cache default in 74 milliseconds
            – and really, 5 entries was migrated, statistics says 5 entries in a default cache, new1

            6) try to obtain values from migrated entries [get --codec=rest key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log

            Adding a link to: https://issues.jboss.org/browse/ISPN-4200

            Summary: the process of rest rolling upgrades work fine even with clustered environment. However, after migration, we are unable to successfully decode entries and read values on new nodes. It seems the problem occurs for both CLI and example configuration test.

            We need to solve that decoding issue for having REST rolling upgrades working properly.

            Tomas Sykora added a comment - mgencur NadirX you might be interested: I left our testsuite and tried manual set up – 2 old servers and 2 new servers, clustered: 1) put 5 entries via CLI into old1 [put --codec=rest default.key1 value1] 2) verify entries are replicated to old2 and accessible (ok) [get --codec=rest default.key1] – returned value1 – OK! 3) try to remotely get key1 issuing get on new1 [get --codec=rest default.key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log 4) new1 fetched this one entry from REST remote cache store ( old1 ) – OK (but can't read it properly) 5) issue synchronize-data operation on new1 using jconsole – returned 5 - this is OK, server replied: 10:19:11,716 INFO org.infinispan.upgrade.RollingUpgradeManager (RMI TCP Connection(5)-127.0.0.1) ISPN000216: 5 entries migrated to cache default in 74 milliseconds – and really, 5 entries was migrated, statistics says 5 entries in a default cache, new1 6) try to obtain values from migrated entries [get --codec=rest key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log Adding a link to: https://issues.jboss.org/browse/ISPN-4200 Summary: the process of rest rolling upgrades work fine even with clustered environment. However, after migration, we are unable to successfully decode entries and read values on new nodes. It seems the problem occurs for both CLI and example configuration test. We need to solve that decoding issue for having REST rolling upgrades working properly.

            The same result with <compatibility enabled="true"/> on target cluster (just trying).

            Added clustered.xml which is used as a config file for OLD cluster, clustered-rest-rolling-upgrade.xml for NEW cluster and zipped trace log (org.infinispan) from a test.

            Tomas Sykora added a comment - The same result with <compatibility enabled="true"/> on target cluster (just trying). Added clustered.xml which is used as a config file for OLD cluster, clustered-rest-rolling-upgrade.xml for NEW cluster and zipped trace log (org.infinispan) from a test.

            NadirX I did some testing around and I have find a few interesting things:

            1) The same scenario for HotRod rolling upgrades works. Even though I can see a JGRP000010 message there as well. Anyway, I will fix the test to completely avoid this.

            2) I avoid JGRP000010 in REST rolling upgrades test (using +1 switch here):
            a) <socket-binding name="jgroups-udp" port="55200" multicast-address="234.99.54.15" multicast-port="45689"/>
            b) <socket-binding name="jgroups-mping" port="55200" multicast-address="234.99.54.15" multicast-port="45701"/> and
            c) <socket-binding name="modcluster" port="0" multicast-address="224.0.1.116" multicast-port="23365"/>

            This differentiate cluster pretty well, I suppose.

            Old cluster's cache-container name is clustered. New cluster's cache-container name is clustered-new – I consider this OK as HotRod rolling upgrades are passing.

            When I use approach stated in 2), HotRod RU are passing, but REST RU have still the same issue and just without JGroups complaining about different versions and dropping packets.

            Any thoughts Tristan? It looks like I've wasted all my ideas for now.
            Thank you!

            Tomas Sykora added a comment - NadirX I did some testing around and I have find a few interesting things: 1) The same scenario for HotRod rolling upgrades works. Even though I can see a JGRP000010 message there as well. Anyway, I will fix the test to completely avoid this. 2) I avoid JGRP000010 in REST rolling upgrades test (using +1 switch here): a) <socket-binding name="jgroups-udp" port="55200" multicast-address="234.99.54.15" multicast-port="45689"/> b) <socket-binding name="jgroups-mping" port="55200" multicast-address="234.99.54.15" multicast-port="45701"/> and c) <socket-binding name="modcluster" port="0" multicast-address="224.0.1.116" multicast-port="23365"/> This differentiate cluster pretty well, I suppose. Old cluster's cache-container name is clustered . New cluster's cache-container name is clustered-new – I consider this OK as HotRod rolling upgrades are passing. When I use approach stated in 2), HotRod RU are passing, but REST RU have still the same issue and just without JGroups complaining about different versions and dropping packets. Any thoughts Tristan? It looks like I've wasted all my ideas for now. Thank you!

            Tomas, I see something very dangerous there:

            JGRP000010: packet from 10.200.136.193:45688 has different version (3.4.3) than ours (3.5.0); packet is discarded

            which means that both clusters are on the same multicast address. This MUST NOT be.

            Tristan Tarrant added a comment - Tomas, I see something very dangerous there: JGRP000010: packet from 10.200.136.193:45688 has different version (3.4.3) than ours (3.5.0); packet is discarded which means that both clusters are on the same multicast address. This MUST NOT be.

            Server output and attached packed test-suite containing new added test as a reproducer.

            Tomas Sykora added a comment - Server output and attached packed test-suite containing new added test as a reproducer.

              tsykora@redhat.com Tomas Sykora
              tsykora@redhat.com Tomas Sykora
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: