Uploaded image for project: 'jBPM'
  1. jBPM
  2. JBPM-8588

Gracefully handle the generic KubernetesClientException on OpenShiftStartUpStrategy

    XMLWordPrintable

Details

    Description

      Scenario 1:
      When two Kieservers pods or more are bootstrapped on multi-KieServer-Pod environment, then there could be a race condition to create config maps by two or more Kieserver pods, the following error could show up in the logs:

      19:01:27,997 ERROR [org.kie.server.services.openshift.impl.storage.cloud.KieServerStateOpenShiftRepository] (ServerService Thread Pool -- 76) Processing KieServerState failed.: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/bsig-cloud/configmaps. Message: configmaps "authoring-ha-kieserver" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=configmaps, name=authoring-ha-kieserver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=configmaps "authoring-ha-kieserver" already exists, metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
      

      This error could be safely ignored since the ConfigMap will be created and the pods will work normally.

      The proposed change is to add a new catch in this part of the code to handle this exception as a warn that can be safely ignored.

      In 7.4.0 a "Known Issue" should be documented alerting users that this error message can be safely ignored explaining that is a simple racing condition to create the configmap used by Kieservers during runtime. The configmap will be created and the pods will work as expected.

      Scenario 2:
      Intermittently, the Watcher is closed due to random KubernetesClientException, such as this 'too old resource version'.

      �[0m�[0m12:20:15,553 INFO  [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) Watcher closed.
      �[0m�[0m12:20:15,554 INFO  [org.kie.server.services.openshift.impl.OpenShiftStartupStrategy] (OkHttp https://172.30.0.1/...) too old resource version: 750726 (779798)
      

      It could be related to known issues from k8s or f8 kube-client. While waiting for the lower level lib to address such issue, from upper level API client perspective, potential options are:
      Option 1 (Short Term):
      Escalate log message level, gracefully terminate Watcher thread, and recommend a Pod recycle.

      Option 2 (Long Term):
      Refactor out the Watcher logic from OpenShiftStartupStrategy into a dedicate component with enhanced resiliency, such as being able to restart Watcher should it exits abnormally.

      Attachments

        Issue Links

          Activity

            People

              rhn-support-fspolti Filippe Spolti
              rhn-support-zanini Ricardo Zanini Fernandes
              Jakub Schwan Jakub Schwan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: