Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-1965

Some entries not available during view change

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 5.1.3.FINAL
    • None
    • None

      In the 4 node, dist mode, num-owners=2, elasticity test
      http://www.qa.jboss.com/~mlinhard/hyperion/run44-elas-dist/

      there is a cca 90 sec period of time where clients get null responses to GET
      requests on entries that should exist in the cache.

      first occurence:
      hyperion1139.log 05:31:01,202 286.409
      last occurence:
      hyperion1135.log 05:32:45,441 390.648
      total occurence count: (in all 19 driver nodes)
      152241
      (this doesn't mean it happens for 152K keys, because each key is retried after
      erroneous attempt)

      data doesn't seem to be lost, because these errors cease after a while and
      number of entries returns back to normal (see cache_entries.csv)

      this happens approximately in the period between node0001 is killed and cluster

      {node0002 - node0004}

      is formed (and shortly after).

            [ISPN-1965] Some entries not available during view change

            mgencur
            There are two types of view changes: nodes leaving/joing a partition, and the cluster splitting in two (or more) sub-partitions (split-brains). NBST only works in the scope of the former.

            Mircea Markus (Inactive) added a comment - mgencur There are two types of view changes: nodes leaving/joing a partition, and the cluster splitting in two (or more) sub-partitions (split-brains). NBST only works in the scope of the former.

            Mircea, I thought NBST was implemented so that the data is available during the view change. Can you please shed some light on the part of the design which alows for data not to be available during view change and thus, why this issue was rejected? Thanks

            Martin Gencur added a comment - Mircea, I thought NBST was implemented so that the data is available during the view change. Can you please shed some light on the part of the design which alows for data not to be available during view change and thus, why this issue was rejected? Thanks

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from MODIFIED to ON_QA

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from MODIFIED to ON_QA

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from ASSIGNED to MODIFIED

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from ASSIGNED to MODIFIED

            we don't offer any consistency guarantees after partition healing.

            Mircea Markus (Inactive) added a comment - we don't offer any consistency guarantees after partition healing.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Set flag to nominate this bug for 6.2 release notes.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Set flag to nominate this bug for 6.2 release notes.

            I don't test partition split as the result is obvious and merge is not fully implemented yet. Nevertheless, InfinispanPartitionableWrapper is implemented and you may use it - however, I really don't know what should be tested now with it (I have implemented that after reading some article which has proven to be just design doc, not implementation status).

            Radim Vansa (Inactive) added a comment - I don't test partition split as the result is obvious and merge is not fully implemented yet. Nevertheless, InfinispanPartitionableWrapper is implemented and you may use it - however, I really don't know what should be tested now with it (I have implemented that after reading some article which has proven to be just design doc, not implementation status).

            Yes, I think we can still simulate scenario with network partition where we lose data. I haven't done that lately, since I know recovering from partitions after merge isn't implemented yet. Maybe rvansa1@redhat.com has seen some of these symptoms recently during his RadarGun resilience tests ?

            Michal Linhard (Inactive) added a comment - Yes, I think we can still simulate scenario with network partition where we lose data. I haven't done that lately, since I know recovering from partitions after merge isn't implemented yet. Maybe rvansa1@redhat.com has seen some of these symptoms recently during his RadarGun resilience tests ?

            mlinhardis this stil actual with NBST?

            Mircea Markus (Inactive) added a comment - mlinhard is this stil actual with NBST?

            Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

            @Tristan, this probably won't get fixed until Eventual Consistency is implemented or dealing with partitions is somehow solved. So 6.1.0.ER9 isn't the right milestone for this ....

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 @Tristan, this probably won't get fixed until Eventual Consistency is implemented or dealing with partitions is somehow solved. So 6.1.0.ER9 isn't the right milestone for this ....

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from NEW to ASSIGNED

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 808623 from NEW to ASSIGNED

            mark yarborough <myarboro@redhat.com> made a comment on bug 808623

            ttarrant will add jira links as appropriate.

            RH Bugzilla Integration added a comment - mark yarborough <myarboro@redhat.com> made a comment on bug 808623 ttarrant will add jira links as appropriate.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1,12 +1,9 @@
            -In rare circumstances, when a node leaves the cluster, instead of going
            -directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
            +In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
            </para>
            <para>
            Normally, when the view changes because a node joins or leaves, the cache data is
            rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners).
            </para>
            <para>
            -While there are multiple partitions, each one can make changes to the data
            -independently, so a remote client will see inconsistencies in the data. When
            -merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
            +While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
            different nodes may hold different values even after the merge.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,12 +1,9 @@ -In rare circumstances, when a node leaves the cluster, instead of going -directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. +In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. </para> <para> Normally, when the view changes because a node joins or leaves, the cache data is rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners). </para> <para> -While there are multiple partitions, each one can make changes to the data -independently, so a remote client will see inconsistencies in the data. When -merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so +While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            This bug is nominated as a known issue for JDG 6 GA Release Notes. If this is not meant to be included till 6.1, perhaps we should exclude this for now. Setting NEEDINFO to Mark to set this to technical_note+ to exclude it, if needed.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 This bug is nominated as a known issue for JDG 6 GA Release Notes. If this is not meant to be included till 6.1, perhaps we should exclude this for now. Setting NEEDINFO to Mark to set this to technical_note+ to exclude it, if needed.

            Manik Surtani <msurtani@redhat.com> made a comment on bug 808623

            No, EC may be a 7.0 feature. A lot of people I speak to in the community don't see this as a priority.

            RH Bugzilla Integration added a comment - Manik Surtani <msurtani@redhat.com> made a comment on bug 808623 No, EC may be a 7.0 feature. A lot of people I speak to in the community don't see this as a priority.

            Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

            I guess this one doesn't have quick solution and should be postponed,
            It's basically about infinispan not being able to handle partitions. We're waiting for eventual consistency feature with this, arent we ?
            so 6.1.0.GA ?

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 I guess this one doesn't have quick solution and should be postponed, It's basically about infinispan not being able to handle partitions. We're waiting for eventual consistency feature with this, arent we ? so 6.1.0.GA ?

            Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 808623

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> made a comment on bug 808623

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1,4 +1,12 @@
            -When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
            +In rare circumstances, when a node leaves the cluster, instead of going
            +directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.
            </para>
            <para>
            -When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..+Normally, when the view changes because a node joins or leaves, the cache data is
            +rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners).
            +</para>
            +<para>
            +While there are multiple partitions, each one can make changes to the data
            +independently, so a remote client will see inconsistencies in the data. When
            +merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so
            +different nodes may hold different values even after the merge.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,12 @@ -When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. +In rare circumstances, when a node leaves the cluster, instead of going +directly to a new cluster view that displays all nodes save the note that has departed, the cluster splits into two partitions which then merge after a short amount of time. During this time, some nodes do not have access to all the data that previously existed in the cache. After the merge, all nodes regain access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. </para> <para> -When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..+Normally, when the view changes because a node joins or leaves, the cache data is +rebalanced on the new cluster members. However, if the number of nodes that leaves the cluster in quick succession equals or is greater than the value of numOwners, keys for the departed nodes are lost. This occurs during a network split as well - regardless of the reasons for the partitions forming, at least one partition will not have all the data (assuming cluster size is greater than numOwners). +</para> +<para> +While there are multiple partitions, each one can make changes to the data +independently, so a remote client will see inconsistencies in the data. When +merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so +different nodes may hold different values even after the merge.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1,3 +1,4 @@
            When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
            -
            +</para>
            +<para>
            When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,3 +1,4 @@ When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. - +</para> +<para> When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

            Dan Berindei <dberinde@redhat.com> made a comment on bug 808623

            Misha, after reading it again I think it could be a little clearer. So here's another attempt:

            In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that contains everyone but the leaver, the cluster splits into two partitions which then merge after a short amount of time. During this time, at least some nodes will not have access to all the data that previously existed in the cache. After the merge, all the nodes will again have access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster.

            Normally, when the view changes because of a join or a leave, the cache data is rebalanced on the new cluster members. However, if numOwners or more nodes leave in quick succession, keys for which all nodes have left will be lost. The same thing happens during a network split - regardless how the partitions form, there will be at least one partition that doesn't have all the data (assuming cluster size > numOwners).

            While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

            RH Bugzilla Integration added a comment - Dan Berindei <dberinde@redhat.com> made a comment on bug 808623 Misha, after reading it again I think it could be a little clearer. So here's another attempt: In rare circumstances, when a node leaves the cluster, instead of going directly to a new cluster view that contains everyone but the leaver, the cluster splits into two partitions which then merge after a short amount of time. During this time, at least some nodes will not have access to all the data that previously existed in the cache. After the merge, all the nodes will again have access to all the data, but changes made during the split may be lost or be visible only to a part of the cluster. Normally, when the view changes because of a join or a leave, the cache data is rebalanced on the new cluster members. However, if numOwners or more nodes leave in quick succession, keys for which all nodes have left will be lost. The same thing happens during a network split - regardless how the partitions form, there will be at least one partition that doesn't have all the data (assuming cluster size > numOwners). While there are multiple partitions, each one can make changes to the data independently, so a remote client will see inconsistencies in the data. When merging, JBoss Data Grid does not attempt to resolve these inconsistencies, so different nodes may hold different values even after the merge.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Dan, please correct if anything is not accurate in the technical notes field, or remove the NeedInfo if you approve.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Dan, please correct if anything is not accurate in the technical notes field, or remove the NeedInfo if you approve.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1 +1,3 @@
            -When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.+When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions.
            +
            +When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,3 @@ -When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.+When a number of nodes larger than the value of numOwner leave a cluster, JBoss Data Grid cannot guarantee that all key values are preserved. In a four node cluster, each partition has two nodes. As a result, each partition loses a number of nodes that equals the value of numOwner and keys that exist prior to the nodes leaving the cluster may not be preserved in both partitions. + +When partitions are merged into a single cluster, key values are preserved in the new cluster (assuming that no clients modified these values during the network split). If a client modified a key during the network split, the old value may be retrieved, the new value may be retrieved, and in some cases the old value may be retrieved after the old value is retrieved. This policy applies to creation and removal as well, if the missing key is equated with a null value..

            Dan Berindei <dberinde@redhat.com> made a comment on bug 808623

            Infinispan doesn't guarantee anything when when more than numOwner nodes leave the cluster. When we have a split in a 4-nodes cluster and each partition has 2 nodes, that means each partition will have lost numOwners nodes and Infinispan can't guarantee that all the pre-existing keys will be kept in both partitions.

            When the partitions merge and we get a single cluster, the key values are usually preserved in the new cluster - assuming that no client modifies the values during the network split. If a client modified a key during the split, Infinispan doesn't offer any guarantees: a client could retrieve the old value or the new value (and it could retrieve the old value after it retrieved the new value). This policy applies for creation/removal as well, if we equate a missing key with a null value.

            RH Bugzilla Integration added a comment - Dan Berindei <dberinde@redhat.com> made a comment on bug 808623 Infinispan doesn't guarantee anything when when more than numOwner nodes leave the cluster. When we have a split in a 4-nodes cluster and each partition has 2 nodes, that means each partition will have lost numOwners nodes and Infinispan can't guarantee that all the pre-existing keys will be kept in both partitions. When the partitions merge and we get a single cluster, the key values are usually preserved in the new cluster - assuming that no client modifies the values during the network split. If a client modified a key during the split, Infinispan doesn't offer any guarantees: a client could retrieve the old value or the new value (and it could retrieve the old value after it retrieved the new value). This policy applies for creation/removal as well, if we equate a missing key with a null value.

            Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1,5 +1 @@
            -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.
            +When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.-
            -This is a new bug not yet thoroughly investigated.
            -
            -I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

            RH Bugzilla Integration added a comment - Misha H. Ali <mhusnain@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,5 +1 @@ -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6. +When the view is changed, some entries are unavailable to some clients, despite existing in the cluster and being loaded in the data loading phase. The total number of entries (retrieved by JMX) is correct, therefore the missing entries are not lost. This error occurs for a brief period of time and then ceases.- -This is a new bug not yet thoroughly investigated. - -I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

            Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

            Technical note updated. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            Diffed Contents:
            @@ -1 +1,5 @@
            -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.+CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.
            +
            +This is a new bug not yet thoroughly investigated.
            +
            +I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,5 @@ -CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.+CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6. + +This is a new bug not yet thoroughly investigated. + +I can only tell the symptoms: during a view change (probably when a partition occurs) some entries aren't available to certain clients although they exist somewhere in the cluster - it was loaded in the data loading phase. The data isn't lost though. Total number of entries (retrieved via JMX) is correct throughout the test. These errors occurs only during a brief period of time and then cease.

            Manik Surtani <msurtani@redhat.com> made a comment on bug 808623

            Technical note added. If any revisions are required, please edit the "Technical Notes" field
            accordingly. All revisions will be proofread by the Engineering Content Services team.

            New Contents:
            CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.

            RH Bugzilla Integration added a comment - Manik Surtani <msurtani@redhat.com> made a comment on bug 808623 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: CCFR - Michal Linhard. If this only pertains to Infiniband/RDMA, then this is a low prio and non-critical to JDG 6.

            JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623

            Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965

            As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047
            From the IPs used in the tests you can see that all of them are mapped to eth0 now.

            RH Bugzilla Integration added a comment - JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623 Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965 As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047 From the IPs used in the tests you can see that all of them are mapped to eth0 now.

            JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623

            Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965

            It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now.
            For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet.

            What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

            RH Bugzilla Integration added a comment - JBoss JIRA Server <jira-update@redhat.com> made a comment on bug 808623 Michal Linhard <mlinhard@redhat.com> made a comment on jira ISPN-1965 It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now. For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet. What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

            As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047
            From the IPs used in the tests you can see that all of them are mapped to eth0 now.

            Michal Linhard (Inactive) added a comment - As you can see here, hyperion hosts have two interfaces: https://docspace.corp.redhat.com/docs/DOC-93047 From the IPs used in the tests you can see that all of them are mapped to eth0 now.

            It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now.
            For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet.

            What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

            Michal Linhard (Inactive) added a comment - It happened on hyperion. But Infiniband is not used anymore. We abandoned Infiniband network after an e-mail discussion. It's pure Ethernet now. For ER6 all elasticity/resilience tests were run on hyperion - again with Ethernet. What happens on hyperion is perfectly valid now. Several issues happen consistently on both edg-perflab and hyperion.

            @Michal Again, is this only a Hyperion/Infiniband issue? If so, we should tag all related issues together - perhaps as hyperion_only - so that we can organise JIRAs better.

            Manik Surtani (Inactive) added a comment - @Michal Again, is this only a Hyperion/Infiniband issue? If so, we should tag all related issues together - perhaps as hyperion_only - so that we can organise JIRAs better.

            Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623

            This might be caused by cache view partitions, which can't be detected by the test controller now, because it only checks JGroups view.
            I requested https://issues.jboss.org/browse/ISPN-1967
            which could help correct this.

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> made a comment on bug 808623 This might be caused by cache view partitions, which can't be detected by the test controller now, because it only checks JGroups view. I requested https://issues.jboss.org/browse/ISPN-1967 which could help correct this.

              dberinde@redhat.com Dan Berindei (Inactive)
              mlinhard Michal Linhard (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: