services.x-alarms ¶

Service level x-alarms reference ¶

           services:
  app01:
    x-alarms:
      Predefined:
        RuleName:
          Topics: []        # Similar to other x-alarms settings
          Settings: {}      # Input values override.

          

Predefined alarms ¶

Common Settings ¶

Note that the following properties can be set to override defaults. It will only update the “Primary” alarm when alarms are composite.

Setting	Default
DatapointsToAlarm	10
EvaluationPeriods	5
Period	60

Attention

Define some scaling range to allow scaling out The alarms below will only be active if there are scaling rules defined.

HighCpuUsageAndMaxScaledOut ¶

Setting name

Default Value

Primary?

Comment

CPUUtilization

75

Y

Percentage, float

RunningTaskCount

MAX()

N

Count, int. Default goes to max value of

x-scaling.Range

This rule will trigger an alert when the CPUUtilization of a given service will go over the threshold and the tasks count is equal to the max scaling capacity (or otherwise overriden value).

Example at 50% CPU usage and override to 4 tasks. ¶

             - Name: HighCpuUsageAndMaxScaledOut
  Settings:
    CPUUtilization: 50             # In percent
    RunningTaskCount: 4            # Number of tasks to evaluate against.

            

HighRamUsageAndMaxScaledOut ¶

Setting name

Default Value

Primary?

Comment

MemoryUtilization

75

Y

Percentage, float

RunningTaskCount

MAX()

N

Count, int. Default goes to max value of

x-scaling.Range

This rule will trigger an alert when the CPUUtilization of a given service will go over the threshold and the tasks count is equal to the max scaling capacity (or otherwise overriden value).

Example at 50% CPU usage and override to 4 tasks. ¶

             - Name: HighRamUsageAndMaxScaledOut
  Settings:
    MemoryUtilization: 50          # In percent
    RunningTaskCount: 4            # Number of tasks to evaluate against.

            

A little bit of philosophy behind alarms ¶

I love alarms, but one should only have alarms that do something relevant to the business criticality impact. Alerting for the sake of alerting might actually cause you more work due. Equally, rules with too aggressive thresholds will more often than not end up in false positives.

For example, CPU High usage alarms are useless if they do not either trigger an activity or response, such as autoscaling. You are paying for the whole 100% of your CPU and if you are not on a burstable instance, you want to use as much as possible of it to make the value worth. Now, high CPU usage on burstable instances is a big deal and you want to do something to avoid throttling.

So as much as alarms are valuable, you should always try to have ones that will action a corrective fix, automated wherever possible, and if not possible, alert people so risks get mitigated.

JSON Schema ¶

Model ¶

services.x-alarms specification ¶

services.x-alarms
The services.x-alarms specification for ComposeX
type	object
properties
Predefined	type	object
additionalProperties	False
definitions
predefinedAlarms	type	object
	properties
	Topics	type	array
		items	type	string
	Settings	type	object
		properties
		Period	type	integer
		EvaluationPeriods	type	integer
		DatapointsToAlarm	type	integer
		RunningTaskCount	type	integer
		CPUUtilization	type	integer / number
		MemoryUtilization	type	integer / number
	additionalProperties	False

Definition ¶

            {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "services.x-alarms.spec.json",
  "id": "services.x-alarms",
  "type": "object",
  "title": "services.x-alarms specification",
  "description": "The services.x-alarms specification for ComposeX",
  "additionalProperties": false,
  "properties": {
    "Predefined": {
      "type": "object",
      "items": {
        "$ref": "#/definitions/predefinedAlarms"
      }
    }
  },
  "definitions": {
    "predefinedAlarms": {
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "Topics": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "Settings": {
          "type": "object",
          "properties": {
            "Period": {
              "type": "integer"
            },
            "EvaluationPeriods": {
              "type": "integer"
            },
            "DatapointsToAlarm": {
              "type": "integer"
            },
            "RunningTaskCount": {
              "type": "integer"
            },
            "CPUUtilization": {
              "type": [
                "integer",
                "number"
              ]
            },
            "MemoryUtilization": {
              "type": [
                "integer",
                "number"
              ]
            }
          }
        }
      }
    }
  }
}