[ISPN-4343] Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.0.0.Alpha4
Component/s: Loaders and Stores, Server
Labels:
- rolling_upgrade

Steps to Reproduce:

Hide

Run attached testRestRollingUpgradesDiffVersionsDist test / or from remote branch.

Show
Run attached testRestRollingUpgradesDiffVersionsDist test / or from remote branch.
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=1104659

A try to mimic the process of REST Rolling Upgrades for one old and new server in a clustered environment failed.

Scenario is quite simple, we start 2 old servers, store some data in, start 2 new servers and point clients to that new cluster.
When issuing a get on a new cluster (want to fetch old entry from old store), the operation fails with attached stack trace.

I also include current ISPN testsuite where is added testRestRollingUpgradesDiffVersionsDist test as a reproducer.

Respective changes are mirrored in my remote branch: https://github.com/tsykora/infinispan/tree/ISPN-4330

You can run test like:

mvn clean verify -P suite.rolling.upgrades -Dzip.dist.old=/home/you/servers/previous-ispn-server-version.zip -Dtest=RestRollingUpgradesTest#testRestRollingUpgradesDiffVersionsDist

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

cannot_be_cast.txt
5 kB
2014/07/07 4:36 AM
clustered.xml
15 kB
2014/06/05 10:32 AM
clustered-rest-rolling-upgrade.xml
15 kB
2014/06/05 10:32 AM
ISPN-4343.txt
79 kB
2014/06/04 8:55 AM
ISPN-4343.zip
296 kB
2014/06/04 8:55 AM
restRollUpsTraceLog.zip
459 kB
2014/06/05 10:32 AM

relates to

ISPN-4200 ExampleConfigsTest.testRestRollingUpgrades fails

Closed

RH Bugzilla Integration added a comment - 2014/07/10 7:52 AM

Tomas Sykora <tsykora@redhat.com> changed the Status of bug 1104659 from NEW to CLOSED

RH Bugzilla Integration added a comment - 2014/07/10 7:52 AM Tomas Sykora <tsykora@redhat.com> changed the Status of bug 1104659 from NEW to CLOSED

Tomas Sykora added a comment - 2014/07/10 7:51 AM

Update:

I have 2 important information:

1) This use case is working for me now as well.
2) I very like pruivo@redhat.com who helped me a lot with this copy-paste-damn-that-easily-overlooked-thing issue!

Closing, not a bug
Thanks!

Tomas Sykora added a comment - 2014/07/10 7:51 AM Update: I have 2 important information: 1) This use case is working for me now as well. 2) I very like pruivo@redhat.com who helped me a lot with this copy-paste-damn-that-easily-overlooked-thing issue! Closing, not a bug Thanks!

Pedro Ruivo added a comment - 2014/07/09 7:10 AM

IMO, I think the codec needs to change. I didn't try it yet, but my idea is to change the codec to put the data as key and put the data type in the MimeMetadata. then we can get rid of MimeCacheEntry.

Pedro Ruivo added a comment - 2014/07/09 7:10 AM IMO, I think the codec needs to change. I didn't try it yet, but my idea is to change the codec to put the data as key and put the data type in the MimeMetadata. then we can get rid of MimeCacheEntry.

Tomas Sykora added a comment - 2014/07/09 7:08 AM

I will fix it and issue a PR, assigning to myself.

Thank you for your help and input Pedro!

Tomas Sykora added a comment - 2014/07/09 7:08 AM I will fix it and issue a PR, assigning to myself. Thank you for your help and input Pedro!

Pedro Ruivo added a comment - 2014/07/09 7:01 AM

ah, I forgot something... I had to disconnect the Rest store from both nodes in the new cluster in your test.

Pedro Ruivo added a comment - 2014/07/09 7:01 AM ah, I forgot something... I had to disconnect the Rest store from both nodes in the new cluster in your test.

Tomas Sykora added a comment - 2014/07/09 6:57 AM

Ah! I overlooked that... thank you pruivo@redhat.com very much!

Cool, a bug in CLI is definitely less critical than a bug in rest rolling upgrades itself.

Tomas Sykora added a comment - 2014/07/09 6:57 AM Ah! I overlooked that... thank you pruivo@redhat.com very much! Cool, a bug in CLI is definitely less critical than a bug in rest rolling upgrades itself.

Pedro Ruivo added a comment - 2014/07/09 6:28 AM

Update:

The test provided by tsykora@redhat.com is passing. There is a bug in the test where the RestStore is pointing to the HotRod port instead of Rest port.

However, when I try with the CLI, we still have the same problem. I believe that is an issue in CLI RestCodec.

Pedro Ruivo added a comment - 2014/07/09 6:28 AM Update: The test provided by tsykora@redhat.com is passing. There is a bug in the test where the RestStore is pointing to the HotRod port instead of Rest port. However, when I try with the CLI, we still have the same problem. I believe that is an issue in CLI RestCodec.

Tomas Sykora added a comment - 2014/07/07 4:42 AM

mgencur NadirX you might be interested:

I left our testsuite and tried manual set up – 2 old servers and 2 new servers, clustered:

1) put 5 entries via CLI into old1 [put --codec=rest default.key1 value1]

2) verify entries are replicated to old2 and accessible (ok) [get --codec=rest default.key1] – returned value1 – OK!

3) try to remotely get key1 issuing get on new1 [get --codec=rest default.key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry"
– see attached cannot_be_cast.txt for full log

4) new1 fetched this one entry from REST remote cache store (old1) – OK (but can't read it properly)

5) issue synchronize-data operation on new1 using jconsole – returned 5 - this is OK, server replied:
10:19:11,716 INFO org.infinispan.upgrade.RollingUpgradeManager (RMI TCP Connection(5)-127.0.0.1) ISPN000216: 5 entries migrated to cache default in 74 milliseconds
– and really, 5 entries was migrated, statistics says 5 entries in a default cache, new1

6) try to obtain values from migrated entries [get --codec=rest key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log

Adding a link to: https://issues.jboss.org/browse/ISPN-4200

Summary: the process of rest rolling upgrades work fine even with clustered environment. However, after migration, we are unable to successfully decode entries and read values on new nodes. It seems the problem occurs for both CLI and example configuration test.

We need to solve that decoding issue for having REST rolling upgrades working properly.

Tomas Sykora added a comment - 2014/07/07 4:42 AM mgencur NadirX you might be interested: I left our testsuite and tried manual set up – 2 old servers and 2 new servers, clustered: 1) put 5 entries via CLI into old1 [put --codec=rest default.key1 value1] 2) verify entries are replicated to old2 and accessible (ok) [get --codec=rest default.key1] – returned value1 – OK! 3) try to remotely get key1 issuing get on new1 [get --codec=rest default.key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log 4) new1 fetched this one entry from REST remote cache store ( old1 ) – OK (but can't read it properly) 5) issue synchronize-data operation on new1 using jconsole – returned 5 - this is OK, server replied: 10:19:11,716 INFO org.infinispan.upgrade.RollingUpgradeManager (RMI TCP Connection(5)-127.0.0.1) ISPN000216: 5 entries migrated to cache default in 74 milliseconds – and really, 5 entries was migrated, statistics says 5 entries in a default cache, new1 6) try to obtain values from migrated entries [get --codec=rest key1] – returned "[B cannot be cast to org.infinispan.remoting.MIMECacheEntry" – see attached cannot_be_cast.txt for full log Adding a link to: https://issues.jboss.org/browse/ISPN-4200 Summary: the process of rest rolling upgrades work fine even with clustered environment. However, after migration, we are unable to successfully decode entries and read values on new nodes. It seems the problem occurs for both CLI and example configuration test. We need to solve that decoding issue for having REST rolling upgrades working properly.

Tomas Sykora added a comment - 2014/06/05 10:32 AM

The same result with <compatibility enabled="true"/> on target cluster (just trying).

Added clustered.xml which is used as a config file for OLD cluster, clustered-rest-rolling-upgrade.xml for NEW cluster and zipped trace log (org.infinispan) from a test.

Tomas Sykora added a comment - 2014/06/05 10:32 AM The same result with <compatibility enabled="true"/> on target cluster (just trying). Added clustered.xml which is used as a config file for OLD cluster, clustered-rest-rolling-upgrade.xml for NEW cluster and zipped trace log (org.infinispan) from a test.

Tomas Sykora added a comment - 2014/06/05 9:56 AM

NadirX I did some testing around and I have find a few interesting things:

1) The same scenario for HotRod rolling upgrades works. Even though I can see a JGRP000010 message there as well. Anyway, I will fix the test to completely avoid this.

2) I avoid JGRP000010 in REST rolling upgrades test (using +1 switch here):
a) <socket-binding name="jgroups-udp" port="55200" multicast-address="234.99.54.15" multicast-port="45689"/>
b) <socket-binding name="jgroups-mping" port="55200" multicast-address="234.99.54.15" multicast-port="45701"/> and
c) <socket-binding name="modcluster" port="0" multicast-address="224.0.1.116" multicast-port="23365"/>

This differentiate cluster pretty well, I suppose.

Old cluster's cache-container name is clustered. New cluster's cache-container name is clustered-new – I consider this OK as HotRod rolling upgrades are passing.

When I use approach stated in 2), HotRod RU are passing, but REST RU have still the same issue and just without JGroups complaining about different versions and dropping packets.

Any thoughts Tristan? It looks like I've wasted all my ideas for now.
Thank you!

Tomas Sykora added a comment - 2014/06/05 9:56 AM NadirX I did some testing around and I have find a few interesting things: 1) The same scenario for HotRod rolling upgrades works. Even though I can see a JGRP000010 message there as well. Anyway, I will fix the test to completely avoid this. 2) I avoid JGRP000010 in REST rolling upgrades test (using +1 switch here): a) <socket-binding name="jgroups-udp" port="55200" multicast-address="234.99.54.15" multicast-port="45689"/> b) <socket-binding name="jgroups-mping" port="55200" multicast-address="234.99.54.15" multicast-port="45701"/> and c) <socket-binding name="modcluster" port="0" multicast-address="224.0.1.116" multicast-port="23365"/> This differentiate cluster pretty well, I suppose. Old cluster's cache-container name is clustered . New cluster's cache-container name is clustered-new – I consider this OK as HotRod rolling upgrades are passing. When I use approach stated in 2), HotRod RU are passing, but REST RU have still the same issue and just without JGroups complaining about different versions and dropping packets. Any thoughts Tristan? It looks like I've wasted all my ideas for now. Thank you!

Tristan Tarrant added a comment - 2014/06/05 5:58 AM

Tomas, I see something very dangerous there:

JGRP000010: packet from 10.200.136.193:45688 has different version (3.4.3) than ours (3.5.0); packet is discarded

which means that both clusters are on the same multicast address. This MUST NOT be.

Tristan Tarrant added a comment - 2014/06/05 5:58 AM Tomas, I see something very dangerous there: JGRP000010: packet from 10.200.136.193:45688 has different version (3.4.3) than ours (3.5.0); packet is discarded which means that both clusters are on the same multicast address. This MUST NOT be.

Tomas Sykora added a comment - 2014/06/04 8:55 AM

Server output and attached packed test-suite containing new added test as a reproducer.

Tomas Sykora added a comment - 2014/06/04 8:55 AM Server output and attached packed test-suite containing new added test as a reproducer.

Assignee:: Tomas Sykora

Reporter:: Tomas Sykora

Archiver:: Amol Dongare

Created:: 2014/06/04 8:53 AM

Updated:: 2021/10/24 6:52 AM

Resolved:: 2014/07/10 7:51 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: RH Bugzilla Integration added a comment - 2014/07/10 7:52 AM

Expand comment: RH Bugzilla Integration added a comment - 2014/07/10 7:52 AM

Collapse comment: Tomas Sykora added a comment - 2014/07/10 7:51 AM

Expand comment: Tomas Sykora added a comment - 2014/07/10 7:51 AM

Collapse comment: Pedro Ruivo added a comment - 2014/07/09 7:10 AM

Expand comment: Pedro Ruivo added a comment - 2014/07/09 7:10 AM

Collapse comment: Tomas Sykora added a comment - 2014/07/09 7:08 AM

Expand comment: Tomas Sykora added a comment - 2014/07/09 7:08 AM

Collapse comment: Pedro Ruivo added a comment - 2014/07/09 7:01 AM

Expand comment: Pedro Ruivo added a comment - 2014/07/09 7:01 AM

Collapse comment: Tomas Sykora added a comment - 2014/07/09 6:57 AM

Expand comment: Tomas Sykora added a comment - 2014/07/09 6:57 AM

Collapse comment: Pedro Ruivo added a comment - 2014/07/09 6:28 AM

Expand comment: Pedro Ruivo added a comment - 2014/07/09 6:28 AM

Collapse comment: Tomas Sykora added a comment - 2014/07/07 4:42 AM

Expand comment: Tomas Sykora added a comment - 2014/07/07 4:42 AM

Collapse comment: Tomas Sykora added a comment - 2014/06/05 10:32 AM

Expand comment: Tomas Sykora added a comment - 2014/06/05 10:32 AM

Collapse comment: Tomas Sykora added a comment - 2014/06/05 9:56 AM

Expand comment: Tomas Sykora added a comment - 2014/06/05 9:56 AM

Collapse comment: Tristan Tarrant added a comment - 2014/06/05 5:58 AM

Expand comment: Tristan Tarrant added a comment - 2014/06/05 5:58 AM

Collapse comment: Tomas Sykora added a comment - 2014/06/04 8:55 AM

Expand comment: Tomas Sykora added a comment - 2014/06/04 8:55 AM

People

Dates

PagerDuty