Uploaded image for project: 'WildFly Core'
  1. WildFly Core
  2. WFCORE-4519

Slave Host Controller deployment repository is cleaned after a full deployment replacement

    XMLWordPrintable

Details

    • Workaround Exists
    • Hide

      As a workaround to avoid the possibility to hit the issue, we can use the server-group:replace-deployment operation instead of deploy --force to update content in the server groups. For example:

      [domain@localhost:9990 /] deploy /applications/test-application.war --name=test-application-v2.war --disabled
      
      [domain@localhost:9990 /] /server-group=main-server-group:replace-deployment(name=test-application-v2.war, runtime-name=test-application.war, to-replace=test-application.war)
      
      Show
      As a workaround to avoid the possibility to hit the issue, we can use the server-group:replace-deployment operation instead of deploy --force to update content in the server groups. For example: [domain@localhost:9990 /] deploy /applications/test-application.war --name=test-application-v2.war --disabled [domain@localhost:9990 /] /server-group=main-server-group:replace-deployment(name=test-application-v2.war, runtime-name=test-application.war, to-replace=test-application.war)

    Description

      In domain mode, there is a cleanup task that removes obsolete content from the deployment repository of each process (DC, slave HC, and servers). By default, this task is executed every five minutes.

      The task checks if there is any content to be marked as obsolete, if there is, it is marked and deleted on the next task execution.

      Deployment content is considerate obsolete in a slave HC if there are no references to it, that means if there is no server group that has this deployment configured.

      The issue here is the deployment handler that replaces the deployment content in a slave is not adding a reference to the new content if there are affected server groups.

      The consequence is the cleanup task could delete the slave HC content. If this occurs when the servers are starting, the servers could fail to start with the following error:

      2019-06-12 08:51:32,813 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "test-application.war")]) - failure description: "WFLYSRV0137: No deployment content with hash b1fb3b872b3490bbdbd152bd082791b1f170397d is available in the deployment content repository for deployment 'test-application.war'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
      2019-06-12 08:51:32,817 FATAL [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0056: Server boot has failed in an unrecoverable manner; exiting. See previous messages for details.
      2019-06-12 08:51:32,833 INFO  [org.jboss.as] (MSC service thread 1-4) WFLYSRV0050: WildFly Full 17.0.0.Final-SNAPSHOT (WildFly Core 9.0.1.Final-SNAPSHOT) stopped in 5ms
      

      The issue is difficult to hit because it is the server who requests the required files to the slave HC. In order to reproduce it, there must be a coincidence when the server has requested a deployment file to its HC, the HC already has this file in its deployment repository marked as obsolete and, before send it to the server, the cleanup task removes it.

      Attachments

        Activity

          People

            yborgess1@redhat.com Yeray Borges Santana
            yborgess1@redhat.com Yeray Borges Santana
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: