Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: RH-SSO-7.1.0.ER2
Affects Version/s: None
Component/s: None
Labels:
None

SFDC Cases Links:
SFDC Cases Counter:

I have a customer that is running into an issue where modifying realm settings causes RH-SSO to perform poorly for 10-30 minutes:

<quote>
Changing certain realm settings occasionally causes Infinispan timeouts on one of the nodes. When this occurs, the affected node becomes non-performant for a period of 10-30 minutes.

Enabling/disabling brute force detection seems to reproduce the issue reliably. Changing other realm settings like under user federation providers does not seem to as frequently produce the issue. Changing realm clients never seems to reproduce the issue.
</quote>

I had the customer enable trace level logging then reproduce the issue. Our clustering specialist reviewed the logs and made the following observation:

didm0: sends request
2016-10-20 14:19:09,544 DEBUG [org.infinispan.interceptors.InvalidationInterceptor] (default task-50) Cache [aus2-sso-didm0] replicating InvalidateCommand

{keys=[dev]}

...
2016-10-20 14:19:09,544 TRACE [org.jgroups.protocols.TCP] (default task-50) aus2-sso-didm0: sending msg to aus2-sso-didm1, src=aus2-sso-didm0, headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]

didm1: receives request
2016-10-20 14:19:09,548 TRACE [org.jgroups.protocols.TCP] (thread-17,ee,aus2-sso-didm1) aus2-sso-didm1: received [dst: aus2-sso-didm1, src: aus2-sso-didm0 (4 headers), size=73 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]
...
2016-10-20 14:19:09,549 TRACE [org.jgroups.blocks.RequestCorrelator] (thread-17,ee,aus2-sso-didm1) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 14938

didm1: processes request on thread remote-thread--p7-t7
Don't appear to be any long stalls in processing.

didm0: timeout after about 17 seconds
2016-10-20 14:19:27,045 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-50) ISPN000136: Error executing command RemoveCommand, writing keys [dev]: org.infinispan.util.concurrent.TimeoutException: Replication timeout for aus2-sso-didm1

didm1: sends reply over a minute later
2016-10-20 14:20:14,033 TRACE [org.jgroups.blocks.RequestCorrelator] (remote-thread--p7-t7) sending rsp for 14938 to aus2-sso-didm0
...
2016-10-20 14:20:14,033 TRACE [org.jgroups.protocols.TCP] (remote-thread--p7-t7) aus2-sso-didm1: sending msg to aus2-sso-didm0, src=aus2-sso-didm1, headers are RequestCorrelator: corr_id=200, type=RSP, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=18246, TP: [cluster_name=ee]

Infinispan is sending an invalidation because an entry was updated.

Keycloak is intercepting that single invalidation command (in org.keycloak.models.cache.infinispan.CacheManager)
and triggering almost 17,000 individual evictions, which is taking over a minute. But the sender is timing out
after a little over 15 seconds (which is normal, since an invalidation would normally be very fast).

Is this expected behavior? If so, is there anyway we can improve this?

relates to

RHSSO-324 OutdatedTopologyException when creating realm during cluster node failback/startup

Closed

Assignee:: Tomas Kyjovsky

Reporter:: Derek Horton

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2016/10/28 6:23 PM

Updated:: 2021/10/24 5:43 AM

Resolved:: 2016/11/16 5:17 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates