Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2550

NoSuchElementException in Hot Rod Encoder

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 5.2.0.Final
    • 5.2.0.Final
    • Remote Protocols
    • None

      Tomas noticed this a while ago in a specific functional test:
      https://bugzilla.redhat.com/show_bug.cgi?id=875151

      I'm creating a more general JIRA, cause I'm having this in resilience test.

      What I found by quick debug, is that here:

      https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/Encoders.scala#L106

                     for (segmentIdx <- 0 until numSegments) {
                        val denormalizedSegmentHashIds = allDenormalizedHashIds(segmentIdx)
                        val segmentOwners = ch.locateOwnersForSegment(segmentIdx)
                        for (ownerIdx <- 0 until segmentOwners.length) {
                           val address = segmentOwners(ownerIdx % segmentOwners.size)
                           val serverAddress = members(address)
                           val hashId = denormalizedSegmentHashIds(ownerIdx)
                           log.tracef("Writing hash id %d for %s:%s", hashId, serverAddress.host, serverAddress.port)
                           writeString(serverAddress.host, buf)
                           writeUnsignedShort(serverAddress.port, buf)
                           buf.writeInt(hashId)
                        }
                     }
      

      we're trying to obtain serverAddress for nonexistent address and NoSuchElementException is not handled properly.
      It hapens after I kill a node in a resilience test and the exception appears when querying for the node in the members cache.

            [ISPN-2550] NoSuchElementException in Hot Rod Encoder

            Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565

            Verified for 6.1.0.ER8

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565 Verified for 6.1.0.ER8

            Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

            RH Bugzilla Integration added a comment - Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

            Sorry Michal, I didn't refresh the JIRA page before posting my comment.

            I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

            Dan Berindei (Inactive) added a comment - Sorry Michal, I didn't refresh the JIRA page before posting my comment. I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

            I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run:

            http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png
            only expected exceptions:
            http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/

            there is still problem with uneven request balancing (ISPN-2632) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

            Michal Linhard (Inactive) added a comment - I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png only expected exceptions: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/ there is still problem with uneven request balancing ( ISPN-2632 ) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

            Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

            RH Bugzilla Integration added a comment - Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

            as I say ISPN-2642 didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

            Michal Linhard (Inactive) added a comment - as I say ISPN-2642 didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

            @Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then?

            @Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ISPN-2642 as well, have you tested it?

            Dan Berindei (Inactive) added a comment - @Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then? @Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ISPN-2642 as well, have you tested it?

            I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch
            and ran resilience tests in hyperion

            http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png

            the issues ISPN-2550 and ISPN-2642 didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

            Michal Linhard (Inactive) added a comment - I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch and ran resilience tests in hyperion http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png the issues ISPN-2550 and ISPN-2642 didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

              dberinde@redhat.com Dan Berindei (Inactive)
              mlinhard Michal Linhard (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: