[ISPN-2103] Concurrent access after removal of an AtomicMap should NOT result in an IllegalStateException when accessed by other threads

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 5.1.5.FINAL
Component/s: Core
Labels:
None

Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=922699

~~ISPN-1121~~ introduces an IllegalStateException that is being thrown from AtomicMap methods once the map handle has become stale (ie. removed from cache). In case of concurrent access, the exception is thrown to all threads not just to the thread that performed the removal. This was fine-ish in older versions of Infinispan before introduction of optimistic and pessimistic locking but it should be reconsidered now because:

1. It interferes/overlaps with transaction isolation. We should rely on the selected locking scheme (OL/PL) to detect conflicts between transactions and report the problem accordingly. Especially if the stale map is used just for reading - this should be allowed to work rather than stop it.
2. This exception is pretty disruptive and awkward to handle. All methods of an AtomicMap can result in this exception and the current thread has to be prepared for handling it if other threads remove the map. Not much transaction isolation.
3. Since the TreeCache is backed by AtomicMap nearly all Tree API can throw this.

The proposed fix consists of:
1. removing AtomicHashMap.removed flag and AtomicHashMap.markRemoved() method.
2. revising AtomicHashMapProxy.assertValid() method to check only if the map is null (ie. removed) but no longer use the removed flag.
3. revising ReadCommittedEntry.commit() method to no longer call markRemoved() method.

The consequences of these changes are:
1. Any further access to a stale map results in IllegalStateException ONLY in the thread that performed the removal. This thread 'knows' the map is stale so it is fine to punish it. Other threads remain unaffected until lock acquisition or commit is performed (depending on locking model).
2. Other threads can continue to use the previously obtained map handle for reads without danger of getting an exception.
3. If a write operation is done on the map, the results depend on the locking model:
3.1 optimistic locking + write skew check: a WriteSkewException will stop the commit during prepare
3.2 optimistic locking, no write skew check: the write is committed and the work of the transaction that removed the map is overwritten. The map is effectively revived.
3.3 pessimistic locking: same as 3.2

Please note 3.2 and 3.3 work the same as for normal values (not atomic maps).

relates to

ISPN-7803 Functional commands-based AtomicMap

Closed

Radoslav Husar added a comment - 2017/01/12 10:35 AM

pruivo@redhat.com If I understand the question the right approach seems to be throwing an ISE here, we can't just recreate removed structures.

Radoslav Husar added a comment - 2017/01/12 10:35 AM pruivo@redhat.com If I understand the question the right approach seems to be throwing an ISE here, we can't just recreate removed structures.

Pedro Ruivo added a comment - 2016/09/19 6:38 AM

Hey guys,
I need an opinion. I've changed the code to make the locking mode to handle the conflicts. However, when a transaction starts and the map does not exists, what behavior do you prefer?
1) throw ISE.
2) create a new map.

Pedro Ruivo added a comment - 2016/09/19 6:38 AM Hey guys, I need an opinion. I've changed the code to make the locking mode to handle the conflicts. However, when a transaction starts and the map does not exists, what behavior do you prefer? 1) throw ISE. 2) create a new map.

RH Bugzilla Integration added a comment - 2016/09/08 3:19 PM

Brad Maxwell <bmaxwell@redhat.com> changed the Status of bug 922699 from ASSIGNED to NEW

RH Bugzilla Integration added a comment - 2016/09/08 3:19 PM Brad Maxwell <bmaxwell@redhat.com> changed the Status of bug 922699 from ASSIGNED to NEW

Bola Delamonte (Inactive) added a comment - 2016/03/15 10:04 AM

We are sometimes getting an issue with stale sessions when starting up JBoss EAP 6.4.0 in clustering mode.

Caused by: java.lang.IllegalStateException: AtomicMap stored under key 0pmrbdehmonywqnaupracijayleprpraqblnlqnquinobu148781457430391740 has been concurrently removed!
at org.infinispan.atomic.AtomicHashMapProxy.assertValid(AtomicHashMapProxy.java:149)
at org.infinispan.atomic.AtomicHashMapProxy.getDeltaMapForRead(AtomicHashMapProxy.java:92)
at org.infinispan.atomic.FineGrainedAtomicHashMapProxy.get(FineGrainedAtomicHashMapProxy.java:213)
at at.prod.caching.CachingServiceImpl.getSessionCache(CachingServiceImpl.java:141) [caching.jar:]
at at.prod.caching.CacheStrategy.getAccounts(CacheStrategy.java:75) [account.jar:]

and the resulting previous error in the log:

    ("subsystem" => "infinispan"),
]) - failure description: {"JBAS014671: Failed services" => {"jboss.infinispan.prodCache.prodCache" => "org.jboss.msc.service.StartException in service jboss.infinispan.prodCache.prodCache: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
    Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl
    Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache prodCache on jboss_jboss-prod.prod.at/prodCache"}}
JBAS014777:   Services which failed to start:      service jboss.infinispan.prodCache.prodCache: org.jboss.msc.service.StartException in service jboss.infinispan.prodCache.prodCache: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl

Any ideas?

Bola Delamonte (Inactive) added a comment - 2016/03/15 10:04 AM We are sometimes getting an issue with stale sessions when starting up JBoss EAP 6.4.0 in clustering mode. Caused by: java.lang.IllegalStateException: AtomicMap stored under key 0pmrbdehmonywqnaupracijayleprpraqblnlqnquinobu148781457430391740 has been concurrently removed! at org.infinispan.atomic.AtomicHashMapProxy.assertValid(AtomicHashMapProxy.java:149) at org.infinispan.atomic.AtomicHashMapProxy.getDeltaMapForRead(AtomicHashMapProxy.java:92) at org.infinispan.atomic.FineGrainedAtomicHashMapProxy.get(FineGrainedAtomicHashMapProxy.java:213) at at.prod.caching.CachingServiceImpl.getSessionCache(CachingServiceImpl.java:141) [caching.jar:] at at.prod.caching.CacheStrategy.getAccounts(CacheStrategy.java:75) [account.jar:] and the resulting previous error in the log: ( "subsystem" => "infinispan" ), ]) - failure description: { "JBAS014671: Failed services" => { "jboss.infinispan.prodCache.prodCache" => "org.jboss.msc.service.StartException in service jboss.infinispan.prodCache.prodCache: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl Caused by: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl Caused by: org.infinispan.CacheException: Initial state transfer timed out for cache prodCache on jboss_jboss-prod.prod.at/prodCache"}} JBAS014777: Services which failed to start: service jboss.infinispan.prodCache.prodCache: org.jboss.msc.service.StartException in service jboss.infinispan.prodCache.prodCache: org.infinispan.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.InterruptedException on object of type StateTransferManagerImpl Any ideas?

Adrian Nistor (Inactive) added a comment - 2012/06/12 6:53 AM

Adding Manik Surtani's comments received on infinispan-dev list:

I had a quick look. The way AtomicMaps are used is that they are retrieved using the AtomicMapLookup and an AtomicMapProxy is passed back. This proxy is then used, spanning multiple transactions. Each caller thread gets its own AtomicMapProxy instance, all referring to a single AtomicHashMap instance which is actually stored in the cache.

So, for readers, just checking whether the AtomicHashMap is null isn't enough since the caller's AtomicMapProxy may have a handle onto an AtomicHashMap that has since been removed.

I think the 'removed' flag should remain, and if, on the proxy at any given time, removed == true, then:

1. For an existing transaction, it should allow the OL/PL code handle anything to do with write skews
2. For a new transaction starting up, re-establish the reference to the AtomicHashMap from the proxy (if it exists), otherwise throw an ISE (new transaction cannot start since it is referring to an instance that doesn't exist anymore).

WDYT?

Cheers
Manik

Adrian Nistor (Inactive) added a comment - 2012/06/12 6:53 AM Adding Manik Surtani's comments received on infinispan-dev list: I had a quick look. The way AtomicMaps are used is that they are retrieved using the AtomicMapLookup and an AtomicMapProxy is passed back. This proxy is then used, spanning multiple transactions. Each caller thread gets its own AtomicMapProxy instance, all referring to a single AtomicHashMap instance which is actually stored in the cache. So, for readers, just checking whether the AtomicHashMap is null isn't enough since the caller's AtomicMapProxy may have a handle onto an AtomicHashMap that has since been removed. I think the 'removed' flag should remain, and if, on the proxy at any given time, removed == true, then: 1. For an existing transaction, it should allow the OL/PL code handle anything to do with write skews 2. For a new transaction starting up, re-establish the reference to the AtomicHashMap from the proxy (if it exists), otherwise throw an ISE (new transaction cannot start since it is referring to an instance that doesn't exist anymore). WDYT? Cheers Manik

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Radoslav Husar added a comment - 2017/01/12 10:35 AM

Expand comment: Radoslav Husar added a comment - 2017/01/12 10:35 AM

Collapse comment: Pedro Ruivo added a comment - 2016/09/19 6:38 AM

Expand comment: Pedro Ruivo added a comment - 2016/09/19 6:38 AM

Collapse comment: RH Bugzilla Integration added a comment - 2016/09/08 3:19 PM

Expand comment: RH Bugzilla Integration added a comment - 2016/09/08 3:19 PM

Collapse comment: Bola Delamonte (Inactive) added a comment - 2016/03/15 10:04 AM

Expand comment: Bola Delamonte (Inactive) added a comment - 2016/03/15 10:04 AM

Collapse comment: Adrian Nistor (Inactive) added a comment - 2012/06/12 6:53 AM

Expand comment: Adrian Nistor (Inactive) added a comment - 2012/06/12 6:53 AM

People

Dates

PagerDuty