Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 10.0.0.Beta1, 10.0.0.Final
Affects Version/s: 9.0.1.Final
Component/s: Management
Labels:
None

Workaround:

Workaround Exists
Workaround Description:
Hide

As a workaround to avoid the possibility to hit the issue, we can use the server-group:replace-deployment operation instead of deploy --force to update content in the server groups. For example:

[domain@localhost:9990 /] deploy /applications/test-application.war --name=test-application-v2.war --disabled [domain@localhost:9990 /] /server-group=main-server-group:replace-deployment(name=test-application-v2.war, runtime-name=test-application.war, to-replace=test-application.war)
Show
As a workaround to avoid the possibility to hit the issue, we can use the server-group:replace-deployment operation instead of deploy --force to update content in the server groups. For example: [domain@localhost:9990 /] deploy /applications/test-application.war --name=test-application-v2.war --disabled [domain@localhost:9990 /] /server-group=main-server-group:replace-deployment(name=test-application-v2.war, runtime-name=test-application.war, to-replace=test-application.war)
Git Pull Request:
https://github.com/wildfly/wildfly-core/pull/3819

Description

In domain mode, there is a cleanup task that removes obsolete content from the deployment repository of each process (DC, slave HC, and servers). By default, this task is executed every five minutes.

The task checks if there is any content to be marked as obsolete, if there is, it is marked and deleted on the next task execution.

Deployment content is considerate obsolete in a slave HC if there are no references to it, that means if there is no server group that has this deployment configured.

The issue here is the deployment handler that replaces the deployment content in a slave is not adding a reference to the new content if there are affected server groups.

The consequence is the cleanup task could delete the slave HC content. If this occurs when the servers are starting, the servers could fail to start with the following error:

2019-06-12 08:51:32,813 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([("deployment" => "test-application.war")]) - failure description: "WFLYSRV0137: No deployment content with hash b1fb3b872b3490bbdbd152bd082791b1f170397d is available in the deployment content repository for deployment 'test-application.war'. This is a fatal boot error. To correct the problem, either restart with the --admin-only switch set and use the CLI to install the missing content or remove it from the configuration, or remove the deployment from the xml configuration file and restart."
2019-06-12 08:51:32,817 FATAL [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0056: Server boot has failed in an unrecoverable manner; exiting. See previous messages for details.
2019-06-12 08:51:32,833 INFO  [org.jboss.as] (MSC service thread 1-4) WFLYSRV0050: WildFly Full 17.0.0.Final-SNAPSHOT (WildFly Core 9.0.1.Final-SNAPSHOT) stopped in 5ms

The issue is difficult to hit because it is the server who requests the required files to the slave HC. In order to reproduce it, there must be a coincidence when the server has requested a deployment file to its HC, the HC already has this file in its deployment repository marked as obsolete and, before send it to the server, the cleanup task removes it.

Attachments

Activity

People

Assignee:: Yeray Borges Santana

Reporter:: Yeray Borges Santana

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2019/06/12 4:06 AM

Updated:: 2020/05/20 9:28 PM

Resolved:: 2019/06/20 4:19 PM