[ISPN-2550] NoSuchElementException in Hot Rod Encoder

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 5.2.0.Final
Affects Version/s: 5.2.0.Final
Component/s: Remote Protocols
Labels:
None

Sprint:
Beta6
Git Pull Request:
https://github.com/infinispan/infinispan/pull/1536
Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=886565

Tomas noticed this a while ago in a specific functional test:
https://bugzilla.redhat.com/show_bug.cgi?id=875151

I'm creating a more general JIRA, cause I'm having this in resilience test.

What I found by quick debug, is that here:

https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/main/scala/org/infinispan/server/hotrod/Encoders.scala#L106

               for (segmentIdx <- 0 until numSegments) {
                  val denormalizedSegmentHashIds = allDenormalizedHashIds(segmentIdx)
                  val segmentOwners = ch.locateOwnersForSegment(segmentIdx)
                  for (ownerIdx <- 0 until segmentOwners.length) {
                     val address = segmentOwners(ownerIdx % segmentOwners.size)
                     val serverAddress = members(address)
                     val hashId = denormalizedSegmentHashIds(ownerIdx)
                     log.tracef("Writing hash id %d for %s:%s", hashId, serverAddress.host, serverAddress.port)
                     writeString(serverAddress.host, buf)
                     writeUnsignedShort(serverAddress.port, buf)
                     buf.writeInt(hashId)
                  }
               }

we're trying to obtain serverAddress for nonexistent address and NoSuchElementException is not handled properly.
It hapens after I kill a node in a resilience test and the exception appears when querying for the node in the members cache.

causes

ISPN-5314 EventSocketTimeoutTest.testSocketTimeoutWithEvent randomly failing

Closed

incorporates

ISPN-2624 JDG: Storage-only example: HotRodDecoder: NSEE: key not found: node1/clustered

Closed

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565

Verified for 6.1.0.ER8

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM Michal Linhard <mlinhard@redhat.com> made a comment on bug 886565 Verified for 6.1.0.ER8

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM Michal Linhard <mlinhard@redhat.com> changed the Status of bug 886565 from ON_QA to VERIFIED

RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from MODIFIED to ON_QA

RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from ASSIGNED to MODIFIED

Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Sorry Michal, I didn't refresh the JIRA page before posting my comment.

I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM Sorry Michal, I didn't refresh the JIRA page before posting my comment. I'm glad the fix works, I'll try to get a unit test working as well before issuing a PR though.

Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run:

http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png
only expected exceptions:
http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/

there is still problem with uneven request balancing (~~ISPN-2632~~) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM I've reduced number of entries in the cache during the test to 5000 1kb entries and I've got a clean resilience test run: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/stats-throughput.png only expected exceptions: http://www.qa.jboss.com/~mlinhard/hyperion3/run0013/report/loganalysis/server/ there is still problem with uneven request balancing ( ISPN-2632 ) and blocking of the whole system after join, when there's more data (5% heap filled), but it doesn't have to be related with issues we're discussing here.

RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM Tristan Tarrant <ttarrant@redhat.com> changed the Status of bug 886565 from NEW to ASSIGNED

Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

as I say ~~ISPN-2642~~ didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM as I say ISPN-2642 didn't appear, so it seems to be fixed, I'm now investigating other problems I have with that test run.

Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

@Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then?

@Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ~~ISPN-2642~~ as well, have you tested it?

Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM @Galder, could we could modify the JIRA subject to say this one happens during leave and the other happens during join then? @Michal, commit https://github.com/danberindei/infinispan/commit/754b9de995221075e14bba7fa459e597bdb16287 should fix ISPN-2642 as well, have you tested it?

Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch
and ran resilience tests in hyperion

http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png

the issues ~~ISPN-2550~~ and ~~ISPN-2642~~ didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM I've patched JDG 6.1.0.ER5 by replacing infinispan-core and infinispan-server-hotrod jars built from dan's branch and ran resilience tests in hyperion http://www.qa.jboss.com/~mlinhard/hyperion3/run0011/report/stats-throughput.png the issues ISPN-2550 and ISPN-2642 didn't appear but the run still wasn't OK. After rejoin of killed node0002 all operations were blocked for more than 5 minutes - i.e. zero throughput in the last stage of the test. I'm investigating what happened there.

Assignee:: Dan Berindei (Inactive)

Reporter:: Michal Linhard (Inactive)

Archiver:: Amol Dongare

Created:: 2012/11/26 10:59 AM

Updated:: 2024/07/15 12:38 PM

Resolved:: 2013/01/03 6:04 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/11 8:23 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/10 9:50 AM

Collapse comment: RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Expand comment: RH Bugzilla Integration added a comment - 2013/01/10 8:14 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/18 9:55 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/18 5:25 AM

Collapse comment: RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Expand comment: RH Bugzilla Integration added a comment - 2012/12/18 4:58 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/17 11:04 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/12/17 11:01 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/12/17 8:30 AM

People

Dates

PagerDuty