Uploaded image for project: 'Application Server 3  4  5 and 6'
  1. Application Server 3 4 5 and 6
  2. JBAS-2628

Website deployment on 4.0.3 SP1 becomes unresponsive over time, stack trace shows most threads "waiting on monitor Entry"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Can't Do
    • Major
    • None
    • JBossAS-4.0.3 SP1
    • None
    • None

    Description

      This issue involves the production instance of 12 international web sites deployed on Jboss4.0.3SP1. These sites were running on Jboss 3.2.3 and jdk1.4 for some months prior to a recent platform upgrade. They had been stable during that time. Since this upgrade we have experienced a periodic degradation in performance requiring a restart of the application server to correct. This occurs about every 12 hours (give or take, dependent on the occurrence of a particular type of activity discussed further below). The sites and server appear to be healthy most of the time. However, once this situation starts it?s not long before additional requests increase the thread count and CPU utilization. The sites become slow to respond. With a little more time (more requests actually, again resulting in a higher thread count) they loose their ability to respond at all. Once in this state, requests to port 80 and port 8080 never return.

      This issue can be duplicated by running our link checker against all 12 web sites deployed on our QA system. This activity, it seems, would be similar to a crawler indexing our production sites. The assumption is that this type of activity is responsible for initiating what ever starts this downward spiral in performance ending in no response from the server. From testing it seems this is not necessarily related to the level of concurrent activity. If we link check one site at a time, after a number of sites, we still end up in this state. Note that if we link check the same site repeatedly, the server does not progress to this bad state. Instead some extra threads are created on the first run. Subsequent runs do not increase the thread count or affect performance. Of course when we link check all 12 sites at once, threads pile up, performance degrades, the sites become unresponsive. When reviewing Apache access logs (in production) for the general timeframe of a failure, numerous requests from one site crawler or another can be seen. Prior to going live with this upgrade, Jmeter was used to apply a heavy load to the sites, exercising numerous pages across all sites and many of the programmatic features. This ran for 3 days non-stop without issue. Under load, we saw the thread count go to about 110 and then stabilize for the duration. The difference, it seems, is the breadth of the link checking rather than the load. Our sites comprise 3000 pages, the load test hit a relatively small portion of these.

      Our environment utilizes 2 CPU servers. While in this bad state, one CPUs worth of capacity will be maxed, nearly 100%. That is one CPU could be a 90+ % and the other near (0%) or they could both share a more even percentage of the load. Regardless of the breakdown, CPU utilization as a whole does not exceed half of the total 2 CPU capacity (this determined using the top command).

      When in this bad state, the thread count (in JVM stack trace) is high, often over 200. No dead locks are indicated in the JVM stack trace. When the system is healthy, we typically see 80 threads or so depending on load. I took 3 traces a few minutes apart as system performance went from bad to worse. The majority of threads in these traces are ?waiting on monitor entry?. Based on the thread traces, threads appear to be executing, or waiting at a low level in the code (below our application). Typically there are only a few threads (outside of the administrative threads) in the runnable state. These runnable threads appear to be in normal processing based on thread stack traces, nothing stands out. However, something is utilizing CPU.

      Attachments

        1. dump1
          560 kB
        2. dump2
          661 kB
        3. dump3
          660 kB

        Activity

          People

            Unassigned Unassigned
            atait_jira atait (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: