Caucho maker of Resin Server | Application Server (Java EE Certified) and Web Server


 

Resin Documentation

home company docs 
app server 
 Resin Server | Application Server (Java EE Certified) and Web Server
 

health meters


Health meters are a simple way to create visually pleasing graphs in /resin-admin.

Configuration

health.xml

Meters are configured as part of health.xml using CanDI to create and update Java objects. Refer to health checking configuration for a full description of health.xml. Resin 4.0.17 and later includes a full compliemnt of pre-configured JMX meters in health.xml.

Example: importing health.xml into resin.xml
<resin xmlns="http://caucho.com/ns/resin"
       xmlns:resin="urn:java:com.caucho.resin">
  <cluster-default>  
    ...
    <!--
       - Admin services
      -->
    <resin:DeployService/>
    
    <resin:if test="${resin.professional}">
      <resin:AdminServices/>
    </resin:if>

    <!--
       - Configuration for the health monitoring system
      -->
    <resin:if test="${resin.professional}">
      <resin:import path="${__DIR__}/health.xml" optional="true"/>
    </resin:if>
    ...
  </cluster-default>
</resin>

Note: <resin:AdminServices/> (or more precisely just <resin:StatsService/>) is required to support health meters and graphing.

Meter names

Health meters are named using a concatenation of keys separated by pipe (|) characters, loosely organized from least specific to most specific. Since meter statistics are shared between each member in a Resin cluster, Resin will automatically prefix each meter name with the cluster node index to insure the name is unique between cluster members.

The pipe character in the name provides a secondary benefit of helping to enhance the /resin-admin UI by categorizing meters into drill downs. Consider the following example.

Example: meter naming
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">

  <health:JmxDeltaMeter>
    <name>JVM|Compilation|Compilation Time</name>
    <object-name>java.lang:type=Compilation</object-name>
    <attribute>TotalCompilationTime</attribute>
  </health:JmxDeltaMeter>

</cluster>

In this example, JVM|Compilation|Compilation Time provides the base of the name. For cluster node index 0, Resin prefixes the name with 00|. /resin-admin will then use the cluster index and first two keys to create drill downs to logically organized meters for display.

Graphs: 00|JVM|Compilation|Compilation Time, Time:1 Hour

JMX meters

Virtually any local numeric JMX MBean attribute can be graphed using JMX meters.

<health:JmxMeter>

child of <cluster>

Creates a meter that graphs the current value of a numeric JMX MBean attribute.

<health:JmxMeter> Attributes
ATTRIBUTEDESCRIPTIONTYPEDEFAULT
nameThe name of the meter to display in /resin-admin (see meter names)StringN/A
objectNameThe JMX MBean nameStringN/A
attributeThe MBean attribute to sampleStringN/A
Example: <health:JmxMeter> in health.xml
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">

  <health:JmxMeter>
    <name>OS|Memory|Physical Memory Free</name>
    <object-name>java.lang:type=OperatingSystem</object-name>
    <attribute>FreePhysicalMemorySize</attribute>
  </health:JmxMeter>

</cluster>

<health:JmxDeltaMeter>

child of <cluster>

Creates a meter that graphs the difference between the current and previous values of a numeric JMX MBean attribute.

<health:JmxDeltaMeter> Attributes
ATTRIBUTEDESCRIPTIONTYPEDEFAULT
nameThe name of the meter to display in /resin-admin (see meter names)StringN/A
objectNameThe JMX MBean nameStringN/A
attributeThe MBean attribute to sampleStringN/A
Example: <health:JmxDeltaMeter> in health.xml
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">

  <health:JmxDeltaMeter>
    <name>JVM|Compilation|Compilation Time</name>
    <object-name>java.lang:type=Compilation</object-name>
    <attribute>TotalCompilationTime</attribute>
  </health:JmxDeltaMeter>

</cluster>

Statistical Analysis

Detecting Anomalies

Meters alone are useful for manual inspection in resin-admin since every meter can be graphed. However Resin provides an extremely useful automatic analysis tool called AnomalyAnalyzer. AnomalyAnalyzer looks at the current meter value, checking for deviations from the average value. So unusual changes like a spike in blocked threads can be detected.

<health:AnomalyAnalyzer>

child of <cluster>

AnomalyAnalyzer examines a meter value, checking for deviations from the average value. So unusual changes like a spike in blocked threads can be detected, logged, and trigger health actions.

<health:AnomalyAnalyzer> Attributes
ATTRIBUTEDESCRIPTIONTYPEDEFAULT
meterName of the meter to analyze (ie. from <health:JmxMeter>)Stringrequired
health-eventA string to use to match using <health:IfHealthEvent>StringNone: when absent no health event will fire
min-samplesMinimum number of samples required to calculate an averageint60 (typically 1 hour of data)
sigma-thresholdThe number of standard deviations for a sample to be considered an anomaly.int5
Example: <health:AnomalyAnalyzer> in health.xml
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">
         
  <health:JmxMeter>
    <name>JVM|Thread|JVM Blocked Count</name>
    <objectName>resin:type=JvmThreads</objectName>
    <attribute>BlockedCount</attribute>
  </health:JmxMeter>

  <health:AnomalyAnalyzer>
    <meter>JVM|Thread|JVM Blocked Count</meter>
    <health-event>caucho.thread.anomaly.jvm-blocked</health-event>
  </health:AnomalyAnalyzer>

  <health:DumpThreads>
    <health:IfHealthEvent regexp="caucho.thread"/>
    <health:IfNotRecent time="15m"/>
  </health:DumpThreads>
  
</cluster>

Standard Anomaly detection

The default health.xml configures some general anomaly analysis. In general, anomaly detection can tell you when something went wrong in the server. It looks for unusual spikes of behavior by recording an average baseline for a value and then looking for deviations.

  • "File Descriptor Count" - counts open files and open TCP sockets. Examples include denial of service attacks (TCP) or open file leaks.
  • "JVM Thread Count" - detects thread spawning. Examples include services that spawn too many threads, a likely application bug.
  • "JVM Runnable Count" - detects active threads. Examples include CPU spikes or infinite looping code, a likely application bug.
  • "JVM Waiting Count" - detects threads waiting for other threads. Examples include synchronization bottlenecks like deadlocks or livelocks, likely application or library bugs.
  • "JVM Blocked Count" - detects threads waiting for other threads. Examples include synchronization bottlenecks like deadlocks or livelocks, likely application or library bugs.
  • "Database|Connection Active" - detects database connection spikes. Examples include database problems.
  • "HTTP|Request Time" - detects spikes in HTTP requests. Examples include problems in the application, e.g. blocking. Useful to take a thread dump to debug further.
  • "HTTP|Ping Time" - detects spikes in HTTP requests. Examples include problems in the application, e.g. blocking. Useful to take a thread dump to debug further.
  • "Port|Throttle Disconnect Count" - detects throttling of attempted TCP connections. Either DOS attacks or an overloaded system.
  • "HTTP|400" - detects spikes in redirects.
  • "HTTP|500" - detects spikes in server exceptions. Indication of application bugs.
  • "Cluster|Message Read Count" - detects overloads in cluster messages. Indication of a Resin cluster issue.
  • "Cluster|Message Write Count" - detects overloads in cluster messages. Indication of a Resin cluster issue.

Reacting to Anomalies

The <health-event> attribute of AnomalyAnalyzer allows us to tie health actions to a detected anomaly by using the <health:IfHealthEvent> condition.

<health:IfHealthEvent>

child of <cluster>

Causes an action to fire in response to a matching health event. This is usually used in combination with <AnomalyAnalyzer> with a <health-event> attribute.

<health:IfHealthEvent> Attributes
ATTRIBUTEDESCRIPTIONTYPEDEFAULT
regexpA regular expression the event must match.java.util.regex.Patternrequired
Example: <health:IfHealthEvent> in health.xml
<cluster xmlns="http://caucho.com/ns/resin"
         xmlns:resin="urn:java:com.caucho.resin"
         xmlns:health="urn:java:com.caucho.health"
         xmlns:ee="urn:java:ee">
         
  <health:JmxMeter>
    <name>JVM|Thread|JVM Blocked Count</name>
    <objectName>resin:type=JvmThreads</objectName>
    <attribute>BlockedCount</attribute>
  </health:JmxMeter>

  <health:AnomalyAnalyzer>
    <meter>JVM|Thread|JVM Blocked Count</meter>
    <health-event>caucho.thread.anomaly.jvm-blocked</health-event>
  </health:AnomalyAnalyzer>

  <health:DumpThreads>
    <health:IfHealthEvent regexp="caucho.thread"/>
    <health:IfNotRecent time="15m"/>
  </health:DumpThreads>
  
</cluster>

Copyright © 1998-2015 Caucho Technology, Inc. All rights reserved. Resin ® is a registered trademark. Quercustm, and Hessiantm are trademarks of Caucho Technology.