-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
I have a customer that is running into an issue where modifying realm settings causes RH-SSO to perform poorly for 10-30 minutes:
<quote>
Changing certain realm settings occasionally causes Infinispan timeouts on one of the nodes. When this occurs, the affected node becomes non-performant for a period of 10-30 minutes.
Enabling/disabling brute force detection seems to reproduce the issue reliably. Changing other realm settings like under user federation providers does not seem to as frequently produce the issue. Changing realm clients never seems to reproduce the issue.
</quote>
I had the customer enable trace level logging then reproduce the issue. Our clustering specialist reviewed the logs and made the following observation:
didm0: sends request
2016-10-20 14:19:09,544 DEBUG [org.infinispan.interceptors.InvalidationInterceptor] (default task-50) Cache [aus2-sso-didm0] replicating InvalidateCommand
...
2016-10-20 14:19:09,544 TRACE [org.jgroups.protocols.TCP] (default task-50) aus2-sso-didm0: sending msg to aus2-sso-didm1, src=aus2-sso-didm0, headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]
didm1: receives request
2016-10-20 14:19:09,548 TRACE [org.jgroups.protocols.TCP] (thread-17,ee,aus2-sso-didm1) aus2-sso-didm1: received [dst: aus2-sso-didm1, src: aus2-sso-didm0 (4 headers), size=73 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]
...
2016-10-20 14:19:09,549 TRACE [org.jgroups.blocks.RequestCorrelator] (thread-17,ee,aus2-sso-didm1) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 14938
didm1: processes request on thread remote-thread--p7-t7
Don't appear to be any long stalls in processing.
didm0: timeout after about 17 seconds
2016-10-20 14:19:27,045 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-50) ISPN000136: Error executing command RemoveCommand, writing keys [dev]: org.infinispan.util.concurrent.TimeoutException: Replication timeout for aus2-sso-didm1
didm1: sends reply over a minute later
2016-10-20 14:20:14,033 TRACE [org.jgroups.blocks.RequestCorrelator] (remote-thread--p7-t7) sending rsp for 14938 to aus2-sso-didm0
...
2016-10-20 14:20:14,033 TRACE [org.jgroups.protocols.TCP] (remote-thread--p7-t7) aus2-sso-didm1: sending msg to aus2-sso-didm0, src=aus2-sso-didm1, headers are RequestCorrelator: corr_id=200, type=RSP, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=18246, TP: [cluster_name=ee]
Infinispan is sending an invalidation because an entry was updated.
Keycloak is intercepting that single invalidation command (in org.keycloak.models.cache.infinispan.CacheManager)
and triggering almost 17,000 individual evictions, which is taking over a minute. But the sender is timing out
after a little over 15 seconds (which is normal, since an invalidation would normally be very fast).
Is this expected behavior? If so, is there anyway we can improve this?
- relates to
-
RHSSO-324 OutdatedTopologyException when creating realm during cluster node failback/startup
- Closed