Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-2500

Unable to reliably deduce metric value trends

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • None
    • 8.0.0.Beta1
    • None
    • None

      The metrics that wildfly collects are runtime values that are reset with each restart of the server.

      This makes it impossible for remote tools that read those values only from the management model to reliably deduce change in their values across server restarts.

      Consider for example the following scenario. We're trying to detect the number of invocations of some EJB method.

      1) collection1: invocations = 1000
      2) collection2: invocations = 1010
      3) server restarts
      4) huge spike in number of calls to the EJB method
      5) collection3: invocations = 1020

      in the above the "collectionN" represents the points in time when the user reads the value of the metric from the management model (by making a fresh connection to it and reading the value). For the user, it would seem that the value slowly increases (by 10 between each invocation) when the exact opposite would be true - between collection2 and collection3, there was a spike of 1020 invocations.

      One of the simpler (IMHO) ways of fixing this would be to add a new runtime attribute to the root of the management model - startup-date. This would be populated at the server startup with the actual date when the server started. This assumes that there really is no way of resetting the metric values to 0 at runtime - for example, I tried to disable and enable statistics which DIDN'T reset the value.

      The users then could remember the value of startup date with each collection of the runtime metrics. If the startup date changed from the last collection, the tool would know that the server restarted and could adapt the calculations accordingly.

      The example above would then look like (the dates are timestamps):
      1) collection1: invocations = 1000, startup-date=1
      2) collection2: invocations = 1010, startup-date=1
      3) server restarts
      4) huge spike in number of calls to the EJB method
      5) collection3: invocations = 1020, startup-date=2

      It is then apparent that collection3 represents a spike because the user can deduce that the 1020 invocations were counted from 0 starting at startup-date=2.

      (Note that this affects RHQ which is limited in its monitoring because of the inability to deduce the trending).

            Unassigned Unassigned
            rhn-engineering-lkrejci Lukas Krejci
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: