Uploaded image for project: 'RH-SSO'
  1. RH-SSO
  2. RHSSO-514

org.infinispan.util.concurrent.TimeoutException when modifying realm settings

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • RH-SSO-7.1.0.ER2
    • None
    • None
    • None

      I have a customer that is running into an issue where modifying realm settings causes RH-SSO to perform poorly for 10-30 minutes:

      <quote>
      Changing certain realm settings occasionally causes Infinispan timeouts on one of the nodes. When this occurs, the affected node becomes non-performant for a period of 10-30 minutes.

      Enabling/disabling brute force detection seems to reproduce the issue reliably. Changing other realm settings like under user federation providers does not seem to as frequently produce the issue. Changing realm clients never seems to reproduce the issue.
      </quote>

      I had the customer enable trace level logging then reproduce the issue. Our clustering specialist reviewed the logs and made the following observation:

      didm0: sends request
      2016-10-20 14:19:09,544 DEBUG [org.infinispan.interceptors.InvalidationInterceptor] (default task-50) Cache [aus2-sso-didm0] replicating InvalidateCommand

      {keys=[dev]}

      ...
      2016-10-20 14:19:09,544 TRACE [org.jgroups.protocols.TCP] (default task-50) aus2-sso-didm0: sending msg to aus2-sso-didm1, src=aus2-sso-didm0, headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]

      didm1: receives request
      2016-10-20 14:19:09,548 TRACE [org.jgroups.protocols.TCP] (thread-17,ee,aus2-sso-didm1) aus2-sso-didm1: received [dst: aus2-sso-didm1, src: aus2-sso-didm0 (4 headers), size=73 bytes, flags=OOB|DONT_BUNDLE|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=200, type=REQ, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=17516, conn_id=3, TP: [cluster_name=ee]
      ...
      2016-10-20 14:19:09,549 TRACE [org.jgroups.blocks.RequestCorrelator] (thread-17,ee,aus2-sso-didm1) calling (org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher) with request 14938

      didm1: processes request on thread remote-thread--p7-t7
      Don't appear to be any long stalls in processing.

      didm0: timeout after about 17 seconds
      2016-10-20 14:19:27,045 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-50) ISPN000136: Error executing command RemoveCommand, writing keys [dev]: org.infinispan.util.concurrent.TimeoutException: Replication timeout for aus2-sso-didm1

      didm1: sends reply over a minute later
      2016-10-20 14:20:14,033 TRACE [org.jgroups.blocks.RequestCorrelator] (remote-thread--p7-t7) sending rsp for 14938 to aus2-sso-didm0
      ...
      2016-10-20 14:20:14,033 TRACE [org.jgroups.protocols.TCP] (remote-thread--p7-t7) aus2-sso-didm1: sending msg to aus2-sso-didm0, src=aus2-sso-didm1, headers are RequestCorrelator: corr_id=200, type=RSP, req_id=14938, rsp_expected=true, FORK: ee:keycloak, UNICAST3: DATA, seqno=18246, TP: [cluster_name=ee]

      Infinispan is sending an invalidation because an entry was updated.

      Keycloak is intercepting that single invalidation command (in org.keycloak.models.cache.infinispan.CacheManager)
      and triggering almost 17,000 individual evictions, which is taking over a minute. But the sender is timing out
      after a little over 15 seconds (which is normal, since an invalidation would normally be very fast).

      Is this expected behavior? If so, is there anyway we can improve this?

            tkyjovsk@redhat.com Tomas Kyjovsky
            rhn-support-dehort Derek Horton
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: