Details
-
Enhancement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Documentation (Ref Guide, User Guide, etc.), Release Notes, Migration, Compatibility/Configuration
-
Description
I use JBoss / WildFly as Windows service (wildfly-service.exe, aka prunsrv.exe, aka Apache Commons Daemon Procrun) and face multiple issues which prevents Windows service recovery actions to work as expected for my WildFly / JBoss Windows service - Windows Service Control Manager (SCM) just doesn't understand that WildFly Windows service failed if my WildFly crashes (for example, in case of OOM and -XX:+CrashOnOutOfMemoryError JVM option), and SCM doesn't execute recovery actions at all.
Below is the list of issues I found being the root cause:
- WildFly (JBoss) launch scripts (domain.bat and standalone.bat) don't return exit code of JVM to the caller sometimes (depends on the way scripts are launched). They should explicitly use
exit /B my_exit_code
to return exit code to the caller always.
- Procrun (wildfly-service.exe) reports about stopped state of Windows service even if JVM process stops with non zero exit code (but this exit code is still returned to SCM).
- Procrun and WildFly service.bat script don't turn on failure actions flag for the Windows service which is installed by their means. Because of this flag is turned off SCM doesn't treat the case when stopped state is reported with non zero exit code as service failure.
I suggest to:
- Change service.bat - add turning on of failure actions flag for the service installed by Procrun.
- Change service.bat - add additional flag (environment variable) to indicate that WildFly is running as Windows service. This flag is needed for transformation of exit code - we cannot use exit codes 1..15999 because of Procrun doesn't define its own error messages and Windows Service Control Manager (SCM) treats exit code reported by Procrun as standard Windows System Error Code, so we need to modify exit code reported by Procrun (exit code of WildFly launch script) to make it not interleaving with existing Windows System Error Codes.
- Change standalone.bat and domain.bat (PowerShell scripts are not used for Windows services) to explicitly return non zero exit code in case of errors. This error code should be adjusted if WildFly runs as Windows service (refer to additional flag introduced as Procrun start parameter in service.bat and described above).
Refer to pull request #3293 at wildfly/wildfly-core GitHub project.