Rules are always composed of 1 or more (up to 9) thresholds responsible for applying the rule native conditions.
When the threshold is met, a monitoring event is generated and made available for publication, replacing any previous one of lower level.
Threshold definition
Any threshold can combine up to 3 conditions:
- Primary condition, based on either :
pattern
: regular expression.value
: long valuesignal
: condition set as per defined by the rulecustom
: any combination of parameters, including potentially the above ones.
- Event time based condition (if not specified, count is set to 1) :
time
: long value. Duration of the potential event.
For action based events, duration is the sum of the event cases within the action. Same applies for the count.
Example : a single action of 10 minutes with 2 thread locker cases of 1 minutes and 3 minutes will not match a thread locker rule with a time threshold of 5 minutes.count
: number of consecutive recording snapshots.percentage_in_action
: percentage presence of the potential event within an action.
When combined with time, time is the duration of the entire action. Same applies for the count.
Example : a single action of 10 minutes with 2 thread locker cases of 1 minutes and 3 minutes will match a thread locker rule with percentage in action of 40% and a time threshold of 5 minutes.
- Task filtering condition :
function
: principal function. Thresholds with function are always applied first within the same level.
message
: any valuable information around the incident. It should contain a description – possibly of the root cause if the incident signature is known – and the remediation actions (examples : ask for this fix, apply this workaround, collect the logs, contact your support).
level
: the criticality level of the event. One of : INFO, WARNING, CRITICAL.
sub_level
: classifies the importance of the event within one level.
All rules have by default a sub level with a value from 1 to 5.
It is possible to override it with a 6 to 10 value (6 overrides 1, 7 overrides 2..) typically to distinguish events with strong meaning.
The combination of the level and sub level is called event ranking : C10 is the maximum one. I1 is the minimum.
Ranking is used to highlight the events in the JZR Monitoring Alerts sheet : C1-10 -> red spectrum, W1-10 -> yellow spectrum, I1-10 -> blue spectrum).
type
: defines the primary condition (as listed above) and the threshold appliance scope :
system
: applies on static process parameters such as process card properties (ex:java version
). Only one event can be generated.
Applies only in reporting mode.global
: applies for the whole session. Therefore, only one event can be generated.session
: applies for each isolated situation. Several events can be generated.action
: applies at the action level. Only one event is generated per action.stack
: applies for each situation matching the threshold conditions within one action. Several events can be generated for one action.
Rules usually do support either global and session scope or action and stack cope or system scope : scope validation is performed at rule loading.
trust_factor
: each threshold has a default trust factor of 100. Trust factor reflects the level of confidence in the event. Usually it applies to rules related to bug suspicions. The trust_factor value must be between 1 and 100.
<thresholds> <threshold type="action signal"time="40"
level="WARNING"
sub_level="9"
message="Task showing identica
lconsecutive stacks for long time. Contact your software provider."/
> <threshold type="action signal" count="2" function="specific functionality" level="INFO" sub_level="6" trust_factor="90" message="Specific functionality is very slow. Apply the following remediation action : ..."/> </thresholds>