Metrics and alerting
The core provides a module named metrics
that collects metrics from the nodes
and sends alerts to the Nethesis portal. The module is installed by the
create-cluster
action.
The module includes the following services:
Key points:
- Single instance running on the leader node
- Monitors all cluster nodes automatically
- Removed if the leader node becomes a worker
- Prometheus: port 9091
- Alertmanager: port 9093
- alert-proxy: port 9095
- Grafana: port 3000 (disabled by default, enabled with Traefik route)
Configuration:
- Prometheus and Alertmanager configurations are recreated on Prometheus restart
- Module restarts on node addition/removal
- alert-proxy restarts on subscription change to send alerts to Nethesis portals
If a subscription is enabled, alerts are sent to Nethesis portals by default.
Please refer to the module README for details on alert configuration and customization.