Setting Up a Monitoring Host¶
Our monitoring stack is based on Prometheus, Grafana, Loki and AlertManager. They are all open-source tools with plenty of documentation and support online.
In order to set up monitoring of Anapaya appliances, there are a few technical requirements.
The monitoring host must be able to reach the management interface of the target appliances.
Firewall rules must allow opening an HTTP(S) connection on the monitoring port of the appliance.
Enabling Telemetry in the Appliance¶
In order to collect telemetry data, you need to enable telemetry exporting in the appliance.
In the appliance configuration of the monitored host, verify that there is an entry for telemetry, as shown in the snippet below. The address determines the address where the appliance exposes the metrics and is in the format
ip:port
. By default, the telemetry address is0.0.0.0:42001
. Prometheus is configured to scrape the metrics at this address.{ "management": { "telemetry": { "address": "<address>" } } }
Setting up Prometheus¶
In order to set up Prometheus, follow the official Prometheus instructions. Specifically,
Ensure you have the latest version of Prometheus installed. Consult the installation guide for reference.
Follow the instructions in Section Configure Prometheus to monitor the sample targets.
To monitor each appliance, add the following snippet in the
prometheus.yml
file. For each host that will be monitored, add an entry in thetargets
section. The appliance address is the one configured in the telemetry section of the appliance configuration, as shown in Enabling Telemetry in the Appliance. It has the formathost:port
, wherehost
can be either an IP address or a hostname.- job_name: 'anapaya-appliance' honor_labels: true metric_relabel_configs: # Add this config if you are using the Anapaya Grafana dashboards. - source_labels: ['hostname'] target_label: 'shortname' - source_labels: ['__name__'] regex: 'target_info' action: drop static_configs: - targets: - <appliance address> labels: product: <product>
Note
If you use the recommended Grafana dashboards, make sure you add the correct product label as the dashboards require this label to be set accordingly.
The available labels are:
core, edge, gate, ca
Start Prometheus. The exact command depends on the method of installation.
Recording and Alerting Rules¶
Prometheus allows the configuration of rules for recording data or creating alerts when an event happens. These alerts can later be picked up by AlertManager and be integrated with your alerting system. You can specify the events that trigger an alert, the scope and severity of the alert, and also provide a description and summary of the firing alert. Below, we provide two examples of how to monitor the state of a service and the state of an interface.
- alert: SystemServiceDown
expr: up == 0
for: 1m
- alert: SCIONInterfaceStateDown
expr: router_interface_up == 0
for: 1m
You can find more information on which metrics can be scraped by Prometheus in Telemetry.
Recommended Alert Rules¶
Anapaya provides a recommended rule set for alerts that can be used as a
starting point. For the published alerts we provide a Troubleshooting &
Runbooks page in the event that one of the alerts is triggered. The
files are accessible in Anapaya’s software repository on cloudsmith.io.
Depending on the product you are using, there is a predefined set of alert
rules, i.e., anapaya-alerts-core
for CORE, anapaya-alerts-edge
for EDGE,
anapaya-alerts-gate
for GATE, and anapaya-alerts-scion-ca
for the
Anapaya SCION CA. There is also an anapaya-alerts-external
package that
includes all recommended alerts.
For example, to download the latest version of the EDGE alert rules, run:
curl -O https://dl.cloudsmith.io/<access_token>/anapaya/stable/raw/names/anapaya-alerts-edge/versions/latest/anapaya-alerts-edge-latest.yml
The <access_token>
is provided to you by Anapaya as part of your software
license.
Adjust your Prometheus configuration to use the downloaded alert rule file. Follow the official instructions on adding an alert rule file to your Prometheus configuration.
Setting up Grafana¶
In order to set up Grafana, follow the official Grafana instructions. Before setting up Grafana, ensure that you have set up and started Prometheus following the instructions in Setting up Prometheus. On a high level, you need to:
Install Grafana.
Log in to Grafana.
Configure Prometheus as a datasource for Grafana.
Start building queries and dashboards. Grafonnet provides a way to programmatically create dashboards. An explanation of the library can be found in this blogpost.
Recommended Grafana Dashboards¶
Anapaya provides a recommended set of Grafana dashboards that can be used as a
starting point. The archives including the JSON dashboards are accessible in
Anapaya’s software repository on cloudsmith.io. The JSON files can then be
imported to
Grafana. Depending on the product you are using, there is a predefined archive
including multiple dashboards, i.e., anapaya-dashboards-core
for CORE,
anapaya-dashboards-edge
for EDGE, anapaya-dashboards-gate
for GATE, and
anapaya-dashboards-scion-ca
for the Anapaya SCION CA. There is also an
anapaya-dashboards-external
archive that includes all recommended
dashboards.
For example, to download the latest version of the EDGE dashboards, run:
curl -O https://dl.cloudsmith.io/<access_token>/anapaya/stable/raw/names/anapaya-dashboards-edge/versions/latest/anapaya-dashboards-edge-latest.zip
The <access_token>
is provided to you by Anapaya as part of your software
license.
In order to import a JSON dashboard, follow the official instructions.
Setting up Loki¶
Monitoring Host¶
In order to export the logs from each monitored host, we use Loki. Follow the official instructions on setting up Loki on the monitoring host.
Ensure that Loki is added as a data source, as explained in the instructions.
Monitored Hosts¶
On the hosts that you want to monitor, add the following snippet in the management/telemetry
section of the
appliance configuration. Make sure to replace the fields in <>
with the actual data based on the host.
url is the URL of the Loki instance. The appliance is configured to send the generated logs to that URL. The user is able to also access the logs on this URL.
address was already configured in Section Setting up Prometheus.
{
"management": {
"telemetry": {
"address": "<address>",
"logging": {
"logging_type": "LOKI",
"loki": {
"basic_auth": {
"password": "<password>",
"username": "<username>"
},
"url": "<url>"
}
}
}
}
}
Setting up AlertManager¶
In order to export the alerts from each monitored host, we use AlertManager. Follow the official instructions on setting up AlertManager. As stated in the official documentation, AlertManager can be integrated with a wide variety of alert systems, including emails, Slack, and Opsgenie.