[ISPN-1965] Some entries not available during view change

Mircea Markus (Inactive) added a comment - 2013/08/29 1:20 PM

mgencur
There are two types of view changes: nodes leaving/joing a partition, and the cluster splitting in two (or more) sub-partitions (split-brains). NBST only works in the scope of the former.

Mircea Markus (Inactive) added a comment - 2013/08/29 1:20 PM mgencur There are two types of view changes: nodes leaving/joing a partition, and the cluster splitting in two (or more) sub-partitions (split-brains). NBST only works in the scope of the former.

Martin Gencur added a comment - 2013/08/26 3:45 AM

Mircea, I thought NBST was implemented so that the data is available during the view change. Can you please shed some light on the part of the design which alows for data not to be available during view change and thus, why this issue was rejected? Thanks

Martin Gencur added a comment - 2013/08/26 3:45 AM Mircea, I thought NBST was implemented so that the data is available during the view change. Can you please shed some light on the part of the design which alows for data not to be available during view change and thus, why this issue was rejected? Thanks

RH Bugzilla Integration added a comment - 2013/08/02 6:04 PM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/08/02 6:04 PM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/08/02 4:48 PM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from ASSIGNED to MODIFIED

RH Bugzilla Integration added a comment - 2013/08/02 4:48 PM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from ASSIGNED to MODIFIED

Mircea Markus (Inactive) added a comment - 2013/05/07 5:07 AM

we don't offer any consistency guarantees after partition healing.

Mircea Markus (Inactive) added a comment - 2013/05/07 5:07 AM we don't offer any consistency guarantees after partition healing.

RH Bugzilla Integration added a comment - 2013/05/06 11:38 PM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Set flag to nominate this bug for 6.2 release notes.

RH Bugzilla Integration added a comment - 2013/05/06 11:38 PM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Set flag to nominate this bug for 6.2 release notes.

Radim Vansa (Inactive) added a comment - 2013/02/06 4:35 AM

I don't test partition split as the result is obvious and merge is not fully implemented yet. Nevertheless, InfinispanPartitionableWrapper is implemented and you may use it - however, I really don't know what should be tested now with it (I have implemented that after reading some article which has proven to be just design doc, not implementation status).

Radim Vansa (Inactive) added a comment - 2013/02/06 4:35 AM I don't test partition split as the result is obvious and merge is not fully implemented yet. Nevertheless, InfinispanPartitionableWrapper is implemented and you may use it - however, I really don't know what should be tested now with it (I have implemented that after reading some article which has proven to be just design doc, not implementation status).

Michal Linhard (Inactive) added a comment - 2013/02/05 3:05 PM

Yes, I think we can still simulate scenario with network partition where we lose data. I haven't done that lately, since I know recovering from partitions after merge isn't implemented yet. Maybe rvansa1@redhat.com has seen some of these symptoms recently during his RadarGun resilience tests ?

Michal Linhard (Inactive) added a comment - 2013/02/05 3:05 PM Yes, I think we can still simulate scenario with network partition where we lose data. I haven't done that lately, since I know recovering from partitions after merge isn't implemented yet. Maybe rvansa1@redhat.com has seen some of these symptoms recently during his RadarGun resilience tests ?

Mircea Markus (Inactive) added a comment - 2013/02/05 9:56 AM

mlinhardis this stil actual with NBST?

Mircea Markus (Inactive) added a comment - 2013/02/05 9:56 AM mlinhard is this stil actual with NBST?

RH Bugzilla Integration added a comment - 2013/01/17 5:13 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

@Tristan, this probably won't get fixed until Eventual Consistency is implemented or dealing with partitions is somehow solved. So 6.1.0.ER9 isn't the right milestone for this ....

RH Bugzilla Integration added a comment - 2013/01/17 5:13 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 @Tristan, this probably won't get fixed until Eventual Consistency is implemented or dealing with partitions is somehow solved. So 6.1.0.ER9 isn't the right milestone for this ....

RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from NEW to ASSIGNED

RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from NEW to ASSIGNED

RH Bugzilla Integration added a comment - 2012/11/14 9:42 AM

mark yarborough <myarboro@redhat.com> made a comment on bug 808623

ttarrant will add jira links as appropriate.

RH Bugzilla Integration added a comment - 2012/11/14 9:42 AM mark yarborough <myarboro@redhat.com> made a comment on bug 808623 ttarrant will add jira links as appropriate.

RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,12 +1,9 @@
-In rare circumstances, when a node leaves the cluster, instead of going
-directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
+In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
</para>
<para>
Normally, when the view changes because a node joins or leaves, the cache data is
rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners).
</para>
<para>
-While there are multiple partitions, each one can make changes to the data
-independently, so a remote client will see inconsistencies in the data. When
-merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
+While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,12 +1,9 @@ -In rare circumstances, when a node leaves the cluster, instead of going -directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. +In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. </para> <para> Normally, when the view changes because a node joins or leaves, the cache data is rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners). </para> <para> -While there are multiple partitions, each one can make changes to the data -independently, so a remote client will see inconsistencies in the data. When -merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so +While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

This bug is nominated as a known issue for JDG 6 GA Release Notes. If this is not meant to be included till 6.1, perhaps we should exclude this for now. Setting NEEDINFO to Mark to set this to technical_note+ to exclude it, if needed.

RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 This bug is nominated as a known issue for JDG 6 GA Release Notes. If this is not meant to be included till 6.1, perhaps we should exclude this for now. Setting NEEDINFO to Mark to set this to technical_note+ to exclude it, if needed.

RH Bugzilla Integration added a comment - 2012/04/20 10:53 AM

Manik Surtani <msurtani@redhat.com> made a comment on bug 808623

No, EC may be a 7.0 feature. A lot of people I speak to in the community don't see this as a priority.

RH Bugzilla Integration added a comment - 2012/04/20 10:53 AM Manik Surtani <msurtani@redhat.com> made a comment on bug 808623 No, EC may be a 7.0 feature. A lot of people I speak to in the community don't see this as a priority.

RH Bugzilla Integration added a comment - 2012/04/20 10:49 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

I guess this one doesn't have quick solution and should be postponed,
It's basically about infinispan not being able to handle partitions. We're waiting for eventual consistency feature with this, arent we ?
so 6.1.0.GA ?

RH Bugzilla Integration added a comment - 2012/04/20 10:49 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 I guess this one doesn't have quick solution and should be postponed, It's basically about infinispan not being able to handle partitions. We're waiting for eventual consistency feature with this, arent we ? so 6.1.0.GA ?

RH Bugzilla Integration added a comment - 2012/04/20 9:31 AM

Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 808623

RH Bugzilla Integration added a comment - 2012/04/20 9:31 AM Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 808623

RH Bugzilla Integration added a comment - 2012/04/06 7:08 PM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,4 +1,12 @@
-When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
+In rare circumstances, when a node leaves the cluster, instead of going
+directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
</para>
<para>
-When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..+Normally, when the view changes because a node joins or leaves, the cache data is
+rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners).
+</para>
+<para>
+While there are multiple partitions, each one can make changes to the data
+independently, so a remote client will see inconsistencies in the data. When
+merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
+different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/04/06 7:08 PM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,12 @@ -When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. +In rare circumstances, when a node leaves the cluster, instead of going +directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. </para> <para> -When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..+Normally, when the view changes because a node joins or leaves, the cache data is +rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners). +</para> +<para> +While there are multiple partitions, each one can make changes to the data +independently, so a remote client will see inconsistencies in the data. When +merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so +different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/04/05 6:44 AM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,3 +1,4 @@
When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
-
+</para>
+<para>
When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

RH Bugzilla Integration added a comment - 2012/04/05 6:44 AM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,4 @@ When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. - +</para> +<para> When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

RH Bugzilla Integration added a comment - 2012/04/05 5:43 AM

Dan Berindei <dberinde@redhat.com> made a comment on bug 808623

Misha, after reading it again I think it could be a little clearer. So here's another attempt:

In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that contains everyone but the leaver, the cluster splits into two partitions which then merge after a short amount of time. During this time, at least some nodes will not have access to all the data that previously existed in the cache. After the merge, all the nodes will again have access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.

Normally, when the view changes because of a join or a leave, the cache data is rebalanced on the new cluster members. However, if numOwners or more nodes leave in quick succession, keys for which all nodes have left will be lost. The same thing happens during a network split - regardless how the partitions form, there will be at least one partition that doesn't have all the data (assuming cluster size > numOwners).

While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/04/05 5:43 AM Dan Berindei <dberinde@redhat.com> made a comment on bug 808623 Misha, after reading it again I think it could be a little clearer. So here's another attempt: In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that contains everyone but the leaver, the cluster splits into two partitions which then merge after a short amount of time. During this time, at least some nodes will not have access to all the data that previously existed in the cache. After the merge, all the nodes will again have access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. Normally, when the view changes because of a join or a leave, the cache data is rebalanced on the new cluster members. However, if numOwners or more nodes leave in quick succession, keys for which all nodes have left will be lost. The same thing happens during a network split - regardless how the partitions form, there will be at least one partition that doesn't have all the data (assuming cluster size > numOwners). While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

RH Bugzilla Integration added a comment - 2012/04/05 4:53 AM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Dan, please correct if anything is not accurate in the technical notes field, or remove the NeedInfo if you approve.

RH Bugzilla Integration added a comment - 2012/04/05 4:53 AM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Dan, please correct if anything is not accurate in the technical notes field, or remove the NeedInfo if you approve.

RH Bugzilla Integration added a comment - 2012/04/05 4:52 AM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,3 @@
-When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.+When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
+
+When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

RH Bugzilla Integration added a comment - 2012/04/05 4:52 AM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,3 @@ -When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.+When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. + +When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

RH Bugzilla Integration added a comment - 2012/04/04 10:07 PM

Dan Berindei <dberinde@redhat.com> made a comment on bug 808623

Infinispan doesn't guarantee anything when when more than numOwner nodes leave the cluster. When we have a split in a 4-nodes cluster and each partition has 2 nodes, that means each partition will have lost numOwners nodes and Infinispan can't guarantee that all the pre-existing keys will be kept in both partitions.

When the partitions merge and we get a single cluster, the key values are usually preserved in the new cluster - assuming that no client modifies the values during the network split. If a client modified a key during the split, Infinispan doesn't offer any guarantees: a client could retrieve the old value or the new value (and it could retrieve the old value after it retrieved the new value). This policy applies for creation/removal as well, if we equate a missing key with a null value.

RH Bugzilla Integration added a comment - 2012/04/04 10:07 PM Dan Berindei <dberinde@redhat.com> made a comment on bug 808623 Infinispan doesn't guarantee anything when when more than numOwner nodes leave the cluster. When we have a split in a 4-nodes cluster and each partition has 2 nodes, that means each partition will have lost numOwners nodes and Infinispan can't guarantee that all the pre-existing keys will be kept in both partitions. When the partitions merge and we get a single cluster, the key values are usually preserved in the new cluster - assuming that no client modifies the values during the network split. If a client modified a key during the split, Infinispan doesn't offer any guarantees: a client could retrieve the old value or the new value (and it could retrieve the old value after it retrieved the new value). This policy applies for creation/removal as well, if we equate a missing key with a null value.

RH Bugzilla Integration added a comment - 2012/04/04 12:03 PM

Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,5 +1 @@
-CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.
+When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.-
-This is a new bug not yet thoroughly investigated.
-
-I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

RH Bugzilla Integration added a comment - 2012/04/04 12:03 PM Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,5 +1 @@ -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6. +When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.- -This is a new bug not yet thoroughly investigated. - -I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

RH Bugzilla Integration added a comment - 2012/04/04 11:02 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,5 @@
-CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.+CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.
+
+This is a new bug not yet thoroughly investigated.
+
+I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

RH Bugzilla Integration added a comment - 2012/04/04 11:02 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,5 @@ -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.+CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6. + +This is a new bug not yet thoroughly investigated. + +I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Manik Surtani <msurtani@redhat.com> made a comment on bug 808623

Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM Manik Surtani <msurtani@redhat.com> made a comment on bug 808623 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623

Michal Linhard <mlinhard@redhat.com> made a comment on jira ~~ISPN-1965~~

As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047
From the IPs used in the tests you can see that all of them are mapped to eth0 now.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623 Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965 As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047 From the IPs used in the tests you can see that all of them are mapped to eth0 now.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623

Michal Linhard <mlinhard@redhat.com> made a comment on jira ~~ISPN-1965~~

It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now.
For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet.

What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623 Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965 It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now. For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet. What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

Michal Linhard (Inactive) added a comment - 2012/04/04 6:12 AM

As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047
From the IPs used in the tests you can see that all of them are mapped to eth0 now.

Michal Linhard (Inactive) added a comment - 2012/04/04 6:12 AM As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047 From the IPs used in the tests you can see that all of them are mapped to eth0 now.

Michal Linhard (Inactive) added a comment - 2012/04/04 6:09 AM

It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now.
For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet.

What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

Michal Linhard (Inactive) added a comment - 2012/04/04 6:09 AM It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now. For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet. What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

Manik Surtani (Inactive) added a comment - 2012/04/04 6:01 AM

@Michal Again, is this only a Hyperion/Infiniband issue? If so, we should tag all related issues together - perhaps as hyperion_only - so that we can organise JIRAs better.

Manik Surtani (Inactive) added a comment - 2012/04/04 6:01 AM @Michal Again, is this only a Hyperion/Infiniband issue? If so, we should tag all related issues together - perhaps as hyperion_only - so that we can organise JIRAs better.

RH Bugzilla Integration added a comment - 2012/04/03 6:58 PM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

This might be caused by cache view partitions, which can't be detected by the test controller now, because it only checks JGroups view.
I requested https://issues.jboss.org/browse/ISPN-1967
which could help correct this.

RH Bugzilla Integration added a comment - 2012/04/03 6:58 PM Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 This might be caused by cache view partitions, which can't be detected by the test controller now, because it only checks JGroups view. I requested https://issues.jboss.org/browse/ISPN-1967 which could help correct this.

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Mircea Markus (Inactive) added a comment - 2013/08/29 1:20 PM

Expand comment: Mircea Markus (Inactive) added a comment - 2013/08/29 1:20 PM

Collapse comment: Martin Gencur added a comment - 2013/08/26 3:45 AM

Expand comment: Martin Gencur added a comment - 2013/08/26 3:45 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/08/02 6:04 PM

Expand comment: RH Bugzilla Integration added a comment - 2013/08/02 6:04 PM

Collapse comment: RH Bugzilla Integration added a comment - 2013/08/02 4:48 PM

Expand comment: RH Bugzilla Integration added a comment - 2013/08/02 4:48 PM

Collapse comment: Mircea Markus (Inactive) added a comment - 2013/05/07 5:07 AM

Expand comment: Mircea Markus (Inactive) added a comment - 2013/05/07 5:07 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/05/06 11:38 PM

Expand comment: RH Bugzilla Integration added a comment - 2013/05/06 11:38 PM

Collapse comment: Radim Vansa (Inactive) added a comment - 2013/02/06 4:35 AM

Expand comment: Radim Vansa (Inactive) added a comment - 2013/02/06 4:35 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2013/02/05 3:05 PM

Expand comment: Michal Linhard (Inactive) added a comment - 2013/02/05 3:05 PM

Collapse comment: Mircea Markus (Inactive) added a comment - 2013/02/05 9:56 AM

Expand comment: Mircea Markus (Inactive) added a comment - 2013/02/05 9:56 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/17 5:13 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/17 5:13 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/12 8:47 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/11/14 9:42 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/11/14 9:42 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Collapse comment: RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/06/05 11:24 PM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/20 10:53 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/20 10:53 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/20 10:49 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/20 10:49 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/20 9:31 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/20 9:31 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/06 7:08 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/06 7:08 PM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/05 6:44 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/05 6:44 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/05 5:43 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/05 5:43 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/05 4:53 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/05 4:53 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/05 4:52 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/05 4:52 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 10:07 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 10:07 PM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 12:03 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 12:03 PM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 11:02 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 11:02 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/04 8:41 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/04/04 6:12 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/04/04 6:12 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/04/04 6:09 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/04/04 6:09 AM

Collapse comment: Manik Surtani (Inactive) added a comment - 2012/04/04 6:01 AM

Expand comment: Manik Surtani (Inactive) added a comment - 2012/04/04 6:01 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/04/03 6:58 PM

Expand comment: RH Bugzilla Integration added a comment - 2012/04/03 6:58 PM

People

Dates

PagerDuty