[ISPN-1704] IllegalStateException in surviving nodes during node crash in cluster

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: 5.2.0.Final
Affects Version/s: 5.1.0.CR3
Component/s: State Transfer
Labels:
- jdg
- jdg6

Git Pull Request:
https://github.com/infinispan/infinispan/pull/876, https://github.com/infinispan/infinispan/pull/883

This bug appeared in EDG build 96: http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-edg-from-source/96/artifact/edg-srcbuild.zip
that contains Infinispan 5.1.0.CR3

Test scenario:

1. start 4 nodes (distributed cache)
2. wait 2 min
3. kill node2
4. wait 2 min
5. start node2
6. wait 2 min and end the test

server side logs:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/8/artifact/report/serverlogs.zip
client side logs:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/8/console-perf05/consoleText
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/8/console-perf07/consoleText

after crashing of the node2, there were no other succesfull requests, most of the requests ended with this error:

ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (HotRodServerWorker-1-43) ISPN000136: Execution error: java.lang.IllegalStateException: Trying to release state transfer shared lock without acquiring it first

before showing the error on the client side, the requests had been blocked around 1,5min

Dan Berindei (Inactive) added a comment - 2012/11/02 10:22 AM

No longer relevant since NBST landed.

Dan Berindei (Inactive) added a comment - 2012/11/02 10:22 AM No longer relevant since NBST landed.

Dan Berindei (Inactive) added a comment - 2012/01/27 7:05 AM

I just created ~~ISPN-1799~~.

Dan Berindei (Inactive) added a comment - 2012/01/27 7:05 AM I just created ISPN-1799 .

Manik Surtani (Inactive) added a comment - 2012/01/27 5:27 AM - edited

@Dan have you got a new JIRA for this, to revisit in 5.2? It would be good to link to it here.

Manik Surtani (Inactive) added a comment - 2012/01/27 5:27 AM - edited @Dan have you got a new JIRA for this, to revisit in 5.2? It would be good to link to it here.

Dan Berindei (Inactive) added a comment - 2012/01/23 7:17 AM

The method StateTransferLockImpl.waitForStateTransferToEnd() didn't have any way of signalling
that it failed to re-acquire the state transfer lock.

I've added a new exception, StateTransferLockReacquisitionException, but we'll have to revisit this for 5.2.

Dan Berindei (Inactive) added a comment - 2012/01/23 7:17 AM The method StateTransferLockImpl.waitForStateTransferToEnd() didn't have any way of signalling that it failed to re-acquire the state transfer lock. I've added a new exception, StateTransferLockReacquisitionException, but we'll have to revisit this for 5.2.

Michal Linhard (Inactive) added a comment - 2012/01/20 7:29 AM

I've ran 5 times with 5.1.0.CR4 and the same settings that managed to reproduce it for 5.1.0.CR3 but no results.

Michal Linhard (Inactive) added a comment - 2012/01/20 7:29 AM I've ran 5 times with 5.1.0.CR4 and the same settings that managed to reproduce it for 5.1.0.CR3 but no results.

Michal Linhard (Inactive) added a comment - 2012/01/19 10:38 AM

I managed to reproduce it once again with pure infinispan 5.1.0.CR3 (four hotrod servers tests)
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/32/artifact/report/serverlogs.zip

Michal Linhard (Inactive) added a comment - 2012/01/19 10:38 AM I managed to reproduce it once again with pure infinispan 5.1.0.CR3 (four hotrod servers tests) http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/32/artifact/report/serverlogs.zip

Michal Linhard (Inactive) added a comment - 2012/01/18 5:23 AM

Now the challenge is to repeat it with TRACE log.

Michal Linhard (Inactive) added a comment - 2012/01/18 5:23 AM Now the challenge is to repeat it with TRACE log.

Michal Linhard (Inactive) added a comment - 2012/01/18 5:22 AM

it appeared in the log again (for 5.1.0.CR4) though in very different situation:
see node04.log in http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/25/artifact/report/serverlogs.zip

Previously it happened for StateTransferLockInterceptor.visitPutKeyValueCommand and happened many times starting shortly after node crash.
Now it's in StateTransferLockInterceptor.visitPrepareCommand and happened only 2 times at the end of the test.

Michal Linhard (Inactive) added a comment - 2012/01/18 5:22 AM it appeared in the log again (for 5.1.0.CR4) though in very different situation: see node04.log in http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-experiments-mlinhard-perflab/25/artifact/report/serverlogs.zip Previously it happened for StateTransferLockInterceptor.visitPutKeyValueCommand and happened many times starting shortly after node crash. Now it's in StateTransferLockInterceptor.visitPrepareCommand and happened only 2 times at the end of the test.

Manik Surtani (Inactive) added a comment - 2012/01/17 11:48 PM

Thanks for keeping an eye on this one, Dan.

Manik Surtani (Inactive) added a comment - 2012/01/17 11:48 PM Thanks for keeping an eye on this one, Dan.

Michal Linhard (Inactive) added a comment - 2012/01/17 5:38 PM

hmm, one more idea, I'll try setting rehashWait to 5sec and see if that increases the chance of the exception... because it might be connected with StateTransferInProgressException.

Michal Linhard (Inactive) added a comment - 2012/01/17 5:38 PM hmm, one more idea, I'll try setting rehashWait to 5sec and see if that increases the chance of the exception... because it might be connected with StateTransferInProgressException.

Assignee:: Dan Berindei (Inactive)

Reporter:: Michal Linhard (Inactive)

Archiver:: Amol Dongare

Created:: 2012/01/11 7:05 AM

Updated:: 2020/02/07 6:10 AM

Resolved:: 2012/11/02 10:22 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/11/02 10:22 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/11/02 10:22 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/01/27 7:05 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/01/27 7:05 AM

Collapse comment: Manik Surtani (Inactive) added a comment - 2012/01/27 5:27 AM, Edited by Manik Surtani - 2012/01/27 5:27 AM

Expand comment: Manik Surtani (Inactive) added a comment - 2012/01/27 5:27 AM, Edited by Manik Surtani - 2012/01/27 5:27 AM

Collapse comment: Dan Berindei (Inactive) added a comment - 2012/01/23 7:17 AM

Expand comment: Dan Berindei (Inactive) added a comment - 2012/01/23 7:17 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/01/20 7:29 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/01/20 7:29 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/01/19 10:38 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/01/19 10:38 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/01/18 5:23 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/01/18 5:23 AM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/01/18 5:22 AM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/01/18 5:22 AM

Collapse comment: Manik Surtani (Inactive) added a comment - 2012/01/17 11:48 PM

Expand comment: Manik Surtani (Inactive) added a comment - 2012/01/17 11:48 PM

Collapse comment: Michal Linhard (Inactive) added a comment - 2012/01/17 5:38 PM

Expand comment: Michal Linhard (Inactive) added a comment - 2012/01/17 5:38 PM

People

Dates

PagerDuty