Metrics and alerting

The core provides a module named metrics that collects metrics from the nodes and sends alerts to the Nethesis portal. The module is installed by the create-cluster action.

The module includes the following services:

Key points:

  • Single instance running on the leader node
  • Monitors all cluster nodes automatically
  • Removed if the leader node becomes a worker
  • Prometheus: port 9091
  • Alertmanager: port 9093
  • alert-proxy: port 9095
  • Grafana: port 3000 (disabled by default, enabled with Traefik route)

Configuration:

  • Prometheus and Alertmanager configurations are recreated on Prometheus restart
  • Module restarts on node addition/removal
  • alert-proxy restarts on subscription change to send alerts to Nethesis portals

If a subscription is enabled, alerts are sent to Nethesis portals by default.

Please refer to the module README for details on alert configuration and customization.