Setting Up a Monitoring Host

Our monitoring stack is based on Prometheus, Grafana, Loki and AlertManager. They are all open-source tools with plenty of documentation and support online.

In order to set up monitoring of Anapaya appliances, there are a few technical requirements.

  1. The monitoring host must be able to reach the management interface of the target appliances.

  2. Firewall rules must allow opening an HTTP(S) connection on the monitoring port of the appliance.

Enabling Telemetry in the Appliance

In order to collect telemetry data, you need to enable telemetry exporting in the appliance.

  1. In the appliance configuration of the monitored host, verify that there is an entry for telemetry, as shown in the snippet below. The address determines the address where the appliance exposes the metrics and is in the format ip:port. By default, the telemetry address is 0.0.0.0:42001. Prometheus is configured to scrape the metrics at this address.

    {
      "management": {
        "telemetry": {
          "address": "<address>"
        }
      }
    }
    

Setting up Prometheus

In order to set up Prometheus, follow the official Prometheus instructions. Specifically,

  1. Ensure you have the latest version of Prometheus installed. Consult the installation guide for reference.

  2. Follow the instructions in Section Configure Prometheus to monitor the sample targets.

    • To monitor each appliance, add the following snippet in the prometheus.yml file. For each host that will be monitored, add an entry in the targets section. The appliance address is the one configured in the telemetry section of the appliance configuration, as shown in Enabling Telemetry in the Appliance. It has the format host:port, where host can be either an IP address or a hostname.

      - job_name: 'anapaya-appliance'
        honor_labels: true
        metric_relabel_configs: # Add this config if you are using the Anapaya Grafana dashboards.
          - source_labels: ['hostname']
            target_label: 'shortname'
          - source_labels: ['__name__']
            regex: 'target_info'
            action: drop
        static_configs:
          - targets:
              - <appliance address>
            labels:
              product: <product>
      

    Note

    If you use the recommended Grafana dashboards, make sure you add the correct product label as the dashboards require this label to be set accordingly.

    The available labels are: core, edge, gate, ca

  3. Start Prometheus. The exact command depends on the method of installation.

Recording and Alerting Rules

Prometheus allows the configuration of rules for recording data or creating alerts when an event happens. These alerts can later be picked up by AlertManager and be integrated with your alerting system. You can specify the events that trigger an alert, the scope and severity of the alert, and also provide a description and summary of the firing alert. Below, we provide two examples of how to monitor the state of a service and the state of an interface.

- alert: SystemServiceDown
  expr: up == 0
  for: 1m

- alert: SCIONInterfaceStateDown
  expr: router_interface_up == 0
  for: 1m

You can find more information on which metrics can be scraped by Prometheus in Telemetry.

Setting up Grafana

In order to set up Grafana, follow the official Grafana instructions. Before setting up Grafana, ensure that you have set up and started Prometheus following the instructions in Setting up Prometheus. On a high level, you need to:

  1. Install Grafana.

  2. Log in to Grafana.

  3. Configure Prometheus as a datasource for Grafana.

  4. Start building queries and dashboards. Grafonnet provides a way to programmatically create dashboards. An explanation of the library can be found in this blogpost.

Setting up Loki

Monitoring Host

In order to export the logs from each monitored host, we use Loki. Follow the official instructions on setting up Loki on the monitoring host.

Ensure that Loki is added as a data source, as explained in the instructions.

Monitored Hosts

On the hosts that you want to monitor, add the following snippet in the management/telemetry section of the appliance configuration. Make sure to replace the fields in <> with the actual data based on the host.

  • url is the URL of the Loki instance. The appliance is configured to send the generated logs to that URL. The user is able to also access the logs on this URL.

  • address was already configured in Section Setting up Prometheus.

{
  "management": {
    "telemetry": {
      "address": "<address>",
      "logging": {
        "logging_type": "LOKI",
        "loki": {
          "basic_auth": {
            "password": "<password>",
            "username": "<username>"
          },
          "url": "<url>"
        }
      }
    }
  }
}

Setting up AlertManager

In order to export the alerts from each monitored host, we use AlertManager. Follow the official instructions on setting up AlertManager. As stated in the official documentation, AlertManager can be integrated with a wide variety of alert systems, including emails, Slack, and Opsgenie.