Uploaded image for project: 'Cloud Enablement'
  1. Cloud Enablement
  2. CLOUD-3104

JGroup clusters but infinspan cannot propagate cache in K8s

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Common
    • None
    • Hide

      The cluster uses KUBE_PING:

      Setting JGroups discovery to kubernetes.KUBE_PING with properties {port_range=>1}
      

      The coordinator uses KUBE_PING to form a cluster successfully

      10:37:50,295 INFO  [org.infinispan.CLUSTER] (thread-9,ejb,keycloak-59c67569f7-wvjmn) ISPN000094: Received new cluster view for channel ejb: [keycloak-59c67569f7-wvjmn|1] (2) [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6]
      10:37:50,299 INFO  [org.infinispan.CLUSTER] (thread-9,ejb,keycloak-59c67569f7-wvjmn) ISPN100000: Node keycloak-59c67569f7-6fdd6 joined the cluster
      [...]
      10:37:52,166 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t2) [Context=actionTokens] ISPN100002: Starting rebalance with members [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6], phase READ_OLD_WRITE_ALL, topology id 2
      [...]
      10:37:52,174 INFO  [org.infinispan.CLUSTER] (remote-thread--p7-t8) [Context=work] ISPN100002: Starting rebalance with members [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6], phase READ_OLD_WRITE_ALL, topology id 2
      

      The initial cache transfer starts. The coordinator starts getting timeouts:

      10:38:24,999 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p10-t1) ISPN000136: Error executing command RemoveCommand, writing keys [98c9eca3-80c3-47b6-a3a7-59dc2a7207f1]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 2 from keycloak-59c67569f7-6fdd6
              at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      
      10:38:35,036 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p10-t1) ISPN000136: Error executing command PutKeyValueCommand, writing keys [98c9eca3-80c3-47b6-a3a7-59dc2a7207f1]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 3 from keycloak-59c67569f7-6fdd6
              at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
              at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      On the other hand the other node gets stuck on the state transfer until eventually it gives up:

      10:37:52,033 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 52) WFLYCLINF0002: Started keys cache from keycloak container
      10:37:52,034 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 53) WFLYCLINF0002: Started realms cache from keycloak container
      10:37:52,036 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 54) WFLYCLINF0002: Started authorization cache from keycloak container
      10:37:52,036 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 61) WFLYCLINF0002: Started users cache from keycloak container
      //It stagnates here
      

      The Kubernetes deployment - please keep in mind, we use the official docker image, just with a custom them atop:

      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:   name: keycloak
      spec:   replicas: 2
        selector:     matchLabels:       app: keycloak
        template:     metadata:       labels:         app: keycloak
          spec:       serviceAccountName: keycloak-kubeping
      
            volumes:       - name: google-credentials
              secret:           secretName: keycloak-google-credentials
      
            securityContext:         runAsUser: 1000
              fsGroup: 1000
              runAsNonRoot: true
      
            affinity:         nodeAffinity:           preferredDuringSchedulingIgnoredDuringExecution:           - weight: 10
                  preference:               matchExpressions: [{key: priority, operator: In, values: [noncritical]}]
      
            terminationGracePeriodSeconds: 60
      
            containers:       - name: keycloak
              image: eu.gcr.io/boclips-prod/boclips/keycloak:$VERSION
      
              envFrom:         - secretRef:             name: keycloak
      
              env:         - name: KUBERNETES_LABELS
                value: "app=keycloak"
              - name: JGROUPS_DISCOVERY_PROTOCOL
                value: kubernetes.KUBE_PING
              - name: JGROUPS_DISCOVERY_PROPERTIES
                value: "port_range=1"
      
              ports:         - name: http
                containerPort: 8080
      
              livenessProbe:           httpGet:             path: /auth/
                  port: http
                initialDelaySeconds: 120
                timeoutSeconds: 5
      
              readinessProbe:           httpGet:             path: /auth/
                  port: http
                initialDelaySeconds: 120
                timeoutSeconds: 1
      
              resources:           requests:             cpu: 2
                  memory: "1024Mi"
                limits:             cpu: 2
                  memory: "1024Mi"
      
            - name: cloud-sql-proxy
              image: gcr.io/cloudsql-docker/gce-proxy:1.12
              securityContext:           allowPrivilegeEscalation: false
      
              command: [/cloud_sql_proxy]
              args:         - -credential_file=/secrets/cloudsql/key.json
      
              envFrom:         - configMapRef:             name: cloud-sql
      
              volumeMounts:         - name: google-credentials
                mountPath: /secrets/cloudsql
                readOnly: true
      
              resources:           requests:             cpu: "100m"
                  memory: "48Mi"
                limits:             cpu: "100m"
                  memory: "48Mi"
      
      Show
      The cluster uses KUBE_PING: Setting JGroups discovery to kubernetes.KUBE_PING with properties {port_range=>1} The coordinator uses KUBE_PING to form a cluster successfully 10:37:50,295 INFO [org.infinispan.CLUSTER] (thread-9,ejb,keycloak-59c67569f7-wvjmn) ISPN000094: Received new cluster view for channel ejb: [keycloak-59c67569f7-wvjmn|1] (2) [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6] 10:37:50,299 INFO [org.infinispan.CLUSTER] (thread-9,ejb,keycloak-59c67569f7-wvjmn) ISPN100000: Node keycloak-59c67569f7-6fdd6 joined the cluster [...] 10:37:52,166 INFO [org.infinispan.CLUSTER] (remote-thread--p7-t2) [Context=actionTokens] ISPN100002: Starting rebalance with members [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6], phase READ_OLD_WRITE_ALL, topology id 2 [...] 10:37:52,174 INFO [org.infinispan.CLUSTER] (remote-thread--p7-t8) [Context=work] ISPN100002: Starting rebalance with members [keycloak-59c67569f7-wvjmn, keycloak-59c67569f7-6fdd6], phase READ_OLD_WRITE_ALL, topology id 2 The initial cache transfer starts. The coordinator starts getting timeouts: 10:38:24,999 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p10-t1) ISPN000136: Error executing command RemoveCommand, writing keys [98c9eca3-80c3-47b6-a3a7-59dc2a7207f1]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 2 from keycloak-59c67569f7-6fdd6 at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) 10:38:35,036 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p10-t1) ISPN000136: Error executing command PutKeyValueCommand, writing keys [98c9eca3-80c3-47b6-a3a7-59dc2a7207f1]: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 3 from keycloak-59c67569f7-6fdd6 at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167) at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87) at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) On the other hand the other node gets stuck on the state transfer until eventually it gives up: 10:37:52,033 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 52) WFLYCLINF0002: Started keys cache from keycloak container 10:37:52,034 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 53) WFLYCLINF0002: Started realms cache from keycloak container 10:37:52,036 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 54) WFLYCLINF0002: Started authorization cache from keycloak container 10:37:52,036 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 61) WFLYCLINF0002: Started users cache from keycloak container //It stagnates here The Kubernetes deployment - please keep in mind, we use the official docker image, just with a custom them atop: --- apiVersion: apps/v 1 kind: Deployment metadata: name: keycloak spec: replicas: 2 selector: matchLabels: app: keycloak template: metadata: labels: app: keycloak spec: serviceAccountName: keycloak-kubeping volumes: - name: google-credentials secret: secretName: keycloak-google-credentials securityContext: runAsUser: 1000 fsGroup: 1000 runAsNonRoot: true affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 preference: matchExpressions: [{ key: priority, operator: In, values: [noncritical]}] terminationGracePeriodSeconds: 60 containers: - name: keycloak image: eu.gcr.io/boclips-prod/boclips/keycloak:$VERSION envFrom: - secretRef: name: keycloak env: - name: KUBERNETES_LABELS value: "app=keycloak" - name: JGROUPS_DISCOVERY_PROTOCOL value: kubernetes.KUBE_PING - name: JGROUPS_DISCOVERY_PROPERTIES value: "port_range= 1 " ports: - name: http containerPort: 8080 livenessProbe: httpGet: path: /auth/ port: http initialDelaySeconds: 120 timeoutSeconds: 5 readinessProbe: httpGet: path: /auth/ port: http initialDelaySeconds: 120 timeoutSeconds: 1 resources: requests: cpu: 2 memory: " 1024 Mi" limits: cpu: 2 memory: " 1024 Mi" - name: cloud-sql-proxy image: gcr.io/cloudsql-docker/gce-proxy: 1 . 12 securityContext: allowPrivilegeEscalation: false command: [/cloud_sql_proxy] args: - -credential_file=/secrets/cloudsql/key.json envFrom: - configMapRef: name: cloud-sql volumeMounts: - name: google-credentials mountPath: /secrets/cloudsql readOnly: true resources: requests: cpu: " 100 m" memory: " 48 Mi" limits: cpu: " 100 m" memory: " 48 Mi"

      Deploying a Keycloak cluster configured to use kube_ping forms a cluster successfully, but when Infinispan starts propagating the cache state it times out.

      The initial cache transfer cannot complete, therefore preventing nodes from becoming healthy.

            Unassigned Unassigned
            calamarbicefalo_jira José Carlos Valero Sánchez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: