Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-7275

Wildfly eats the CPU up to 100% and does not respond

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 11.0.0.Alpha1
    • 10.1.0.Final
    • None
    • None

      Hi!

      I have a JAX-RS application and after a lot of load, one thread eats up the CPU (100% usage) even when the load test app is terminated, and never drops down until I restart the app server.
      It causes me a lot of headeches because it totally makes the app server unusable until I restart, but the users unable to use the app in the meanwhile.

      I'v attached the stack trace but I unfortunately don't see anything according to my knowledge.
      Please help me, I can provide more information if necessary.

        1. Screen Shot 2016-10-07 at 20.18.27.png
          Screen Shot 2016-10-07 at 20.18.27.png
          1.12 MB
        2. Screen Shot 2016-10-07 at 20.18.31.png
          Screen Shot 2016-10-07 at 20.18.31.png
          1.11 MB
        3. wildfly-after-hang.txt
          179 kB
        4. wildfly-after-hang+1min.txt
          181 kB
        5. wildfly-hang.txt
          182 kB
        6. wildfly-hang-real-stacktrace.txt
          100 kB

            [WFLY-7275] Wildfly eats the CPU up to 100% and does not respond

            Kabir Khan added a comment -

            Bulk closing issues that were resolved against old releases

            Kabir Khan added a comment - Bulk closing issues that were resolved against old releases

            franklangelage I see your point but thanks for your pointers to a solution!

            Christian Bjørnbak (Inactive) added a comment - franklangelage I see your point but thanks for your pointers to a solution!

            In general I agree with you, I also would like to see public patches for the open source releases.
            But it's also understandable, that JBoss / RedHat as a commercial company has to earn money.
            So you might have a look at the commercial JBoss version, the Enterprise Application Platform.
            Otherwise you can hot fix your current version:

            • Download xnio-nio-3.4.x.Final.jar and xnio-api-3.4.x.Final.jar (x >= 1)
            • put these files into the folders $JBOSS_HOME/modules/system/layers/base/org/jboss/xnio/main and $JBOSS_HOME/modules/system/layers/base/org/jboss/xnio/nio/main
            • modify module.xml in both directories to load the new version of the files instead of 3.4.0.
              Restart JBoss.

            Frank Langelage (Inactive) added a comment - In general I agree with you, I also would like to see public patches for the open source releases. But it's also understandable, that JBoss / RedHat as a commercial company has to earn money. So you might have a look at the commercial JBoss version, the Enterprise Application Platform. Otherwise you can hot fix your current version: Download xnio-nio-3.4.x.Final.jar and xnio-api-3.4.x.Final.jar (x >= 1) put these files into the folders $JBOSS_HOME/modules/system/layers/base/org/jboss/xnio/main and $JBOSS_HOME/modules/system/layers/base/org/jboss/xnio/nio/main modify module.xml in both directories to load the new version of the files instead of 3.4.0. Restart JBoss.

            I experience the same problem. A Wildfly 11 GA is not anytime soon. Shouldn't a serious bug like this be backported and released in a WF 10.2?

            Christian Bjørnbak (Inactive) added a comment - I experience the same problem. A Wildfly 11 GA is not anytime soon. Shouldn't a serious bug like this be backported and released in a WF 10.2?

            Marek Kopecky added a comment - PR is linked in XNIO-276 : https://github.com/xnio/xnio/pull/97

            I'm experiencing this too, can you please link your PR? Thank you!

            Giovanni Lovato (Inactive) added a comment - I'm experiencing this too, can you please link your PR? Thank you!

            I have submitted an XNIO pull request that should fix this. I think the problem is caused by the use of CONNECTION_HIGH_WATER to control the max number of connections.

            Stuart Douglas (Inactive) added a comment - I have submitted an XNIO pull request that should fix this. I think the problem is caused by the use of CONNECTION_HIGH_WATER to control the max number of connections.

            Ok, now I uploaded the real stacktrace wildfly-hang-real-stacktrace.txt. Unfortunately, the hung threads are in compiled code except 2-3 which are in:

            • org.xnio.ChannelListener.invokeChannelListener(java.nio.channels.Channel, org.xnio.ChannelListener)
            • org.xnio.nio.QueuedTcpServer$1.run()

            Krisztian Kocsis (Inactive) added a comment - Ok, now I uploaded the real stacktrace wildfly-hang-real-stacktrace.txt . Unfortunately, the hung threads are in compiled code except 2-3 which are in: org.xnio.ChannelListener.invokeChannelListener(java.nio.channels.Channel, org.xnio.ChannelListener) org.xnio.nio.QueuedTcpServer$1.run()

            Note that I 2 months ago I was able to produce h2load -n 1000000 -c 20000 without any errors to a Vert.X (Netty) server on the same machine.

            Krisztian Kocsis (Inactive) added a comment - Note that I 2 months ago I was able to produce h2load -n 1000000 -c 20000 without any errors to a Vert.X (Netty) server on the same machine.

            So, I run a test with *h2load -n 5000 -c 1000 https://...*. After that even -c 10 does not work (10 concurrent conn.). I'v attached the screenshots of htop after the load test + two stack traces directly after the hang and after 1min (unable to recover without restart). SSL config: TLS 1.2 only, (ciphers only: AES128GCM, ECDHE+RSA). Note that actually the scheduled EJB task is properly executed in every 2 minutes (checked the log files), only the web interface is dead, but it is totally.

            Krisztian Kocsis (Inactive) added a comment - So, I run a test with *h2load -n 5000 -c 1000 https://...* . After that even -c 10 does not work (10 concurrent conn.). I'v attached the screenshots of htop after the load test + two stack traces directly after the hang and after 1min (unable to recover without restart). SSL config: TLS 1.2 only, (ciphers only: AES128GCM, ECDHE+RSA). Note that actually the scheduled EJB task is properly executed in every 2 minutes (checked the log files), only the web interface is dead, but it is totally.

            @David: I try to reproduce the issue, it takes some time. The main thread was show and 10108 in htop as using 100% but I think there was an other thread too (I don't remember correctly).

            @James: Yes, I run a periodic (every 2 minutes) EJB task that imports data from one datasource to an other datasource.

            Krisztian Kocsis (Inactive) added a comment - @David: I try to reproduce the issue, it takes some time. The main thread was show and 10108 in htop as using 100% but I think there was an other thread too (I don't remember correctly). @James: Yes, I run a periodic (every 2 minutes) EJB task that imports data from one datasource to an other datasource.

            Are you using a scheduled job with the ManagedScheduledExecutorService? I didn't look in detail on the thread-dump, but noticed some parked EE concurrency threads. They don't look unusual, just trying to get an idea of what your application might be doing.

            James Perkins added a comment - Are you using a scheduled job with the ManagedScheduledExecutorService ? I didn't look in detail on the thread-dump, but noticed some parked EE concurrency threads. They don't look unusual, just trying to get an idea of what your application might be doing.

            David Lloyd added a comment -

            What thread? You can find out by using "ps" or "top" to display the PID of the thread, and then convert that value to hexadecimal and search for that value in the output of jstack.

            David Lloyd added a comment - What thread? You can find out by using "ps" or "top" to display the PID of the thread, and then convert that value to hexadecimal and search for that value in the output of jstack.

              sdouglas1@redhat.com Stuart Douglas (Inactive)
              krisztian.kocsis_jira Krisztian Kocsis (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: