Telemetry

Overview

Each Anapaya appliance exposes a telemetry endpoint that can be used to retrieve telemetry data from the appliance.

Tip

To enable telemetry of the appliance, the telemetry endpoint needs to be configured in the Management section of the appliance configuration.

The telemetry data is exported in the form of Prometheus metrics. Prometheus is an open-source systems monitoring and alerting tool. It collects and stores metrics as time series data alongside optional key-value pairs called labels. A metric is a numeric measurement of a specific event or condition, e.g., the number of packets sent on a specific interface. Recording metrics in time series provides then higher-level insights such as the rate of change of the sent packet counter to calculate the throughput of the interface. Labels add additional dimensions to a metric, e.g., the name of the interface for which the packet count is collected is added as a label.

Each Anapaya appliance internally has several modules that expose some of their internal states as metrics. Each module manages a particular part of the system, such as the SCION control plane, the SCION data plane, or the IP-in-SCION tunneling service. For each module, we list the exposed metrics, their names, the type of the metric, a brief description, and the attached labels. Please refer to the individual sections below for more information.

To access these metrics, a Prometheus server is needed that ingests the metrics from each appliance. How to set up a Prometheus server to collect appliance metrics is outside the scope of this document. Please refer to the Prometheus Getting Started guide for more information. Should you require assistance with integrating appliance metrics in your monitoring setup, please contact Anapaya’s customer support team (customer-support@anapaya.net).

Control Plane Metrics

control_beaconing_originated_beacons_total

Description

Total number of beacons originated.

Type

counter

Labels

egress_interface result

control_beaconing_propagated_beacons_total

Description

Total number of beacons propagated.

Type

counter

Labels

start_isd_as ingress_interface egress_interface result

control_beaconing_received_beacons_total

Description

Total number of beacons received.

Type

counter

Labels

ingress_interface neighbor_isd_as result

control_beaconing_registered_segments_total

Description

Total number of segments registered.

Type

counter

Labels

start_isd_as ingress_interface seg_type result

control_segment_expiration_deficient

Description

Indicates whether the expiration time of the segment is below the configured maximum. This happens when the signer expiration time is lower than the maximum segment expiration time.

Type

gauge

Labels

None

control_segment_lookup_requests_total

Description

Total number of path segments requests received.

Type

counter

Labels

dst_isd seg_type result

control_segment_registry_segments_received_total

Description

Total number of path segments received through registrations.

Type

counter

Labels

src seg_type result

renewal_ca_health_status

Description

Exposes the status of the CA (available, unavailable, starting, stopping), if the host acts as CA and is delegating certificate renewal to the CA service.

Type

gauge

Labels

status

renewal_handled_requests_total

Description

Total number of renewal requests served by each handler type (legacy, in-process, delegating).

Type

counter

Labels

result type

renewal_received_requests_total

Description

Total number of renewal requests served.

Type

counter

Labels

result

renewal_registered_handlers

Description

Exposes which handler type (legacy, in-process, delegating) is registered.

Type

gauge

Labels

type

trustengine_latest_trc_not_after_time_seconds

Description

The not_after time of the latest TRC for the local ISD in seconds since UNIX epoch.

Type

gauge

Labels

None

trustengine_latest_trc_not_before_time_seconds

Description

The not_before time of the latest TRC for the local ISD in seconds since UNIX epoch.

Type

gauge

Labels

None

trustengine_latest_trc_serial_number

Description

The serial number of the latest TRC for the local ISD.

Type

gauge

Labels

None

Data Plane Metrics

router_dropped_pkts_total

Description

Total number of packets dropped.

Type

counter

Labels

interface isd_as neighbor_isd_as type

router_input_bytes_total

Description

Total number of bytes received

Type

counter

Labels

interface isd_as neighbor_isd_as

router_input_pkts_total

Description

Total number of packets received

Type

counter

Labels

interface isd_as neighbor_isd_as

router_interface_up

Description

1 indicates the interface is up, 0 otherwise.

Type

gauge

Labels

interface isd_as neighbor_isd_as

router_output_bytes_total

Description

Total number of bytes sent.

Type

counter

Labels

interface isd_as neighbor_isd_as

router_output_pkts_total

Description

Total number of packets sent.

Type

counter

Labels

interface isd_as neighbor_isd_as

dataplane_control_dataplane_sync_error

Description

Indicates whether the last dataplane sync had an error (1) or not (0).

Type

gauge

Labels

None

IP-in-SCION Tunneling Metrics

gateway_domain_traffic_matcher_sessions_total

Description

The number of live sessions per traffic matcher in a domain.

Type

gauge

Labels

domain traffic_matcher

gateway_flow_exporter_cleanup_run_time

Description

Overall time the flow clean up has been running, in seconds.

Type

gauge

Labels

None

gateway_flow_exporter_export_run_time

Description

Overall time the flow exporting has been running, in seconds.

Type

gauge

Labels

None

gateway_flow_exporter_last_cleanup_time

Description

The timestamp up until which the finished flows were deleted. Seconds since UNIX epoch.

Type

gauge

Labels

None

gateway_flow_exporter_last_export_time

Description

The timestamp of the last time when the flow metrics were exported, successfully. Measured in seconds since UNIX epoch.

Type

gauge

Labels

None

gateway_flow_exporter_lost

Description

The cumulative duration of time (in seconds) for which there has been flow data lost by the flow exporter.

Type

counter

Labels

None

gateway_ippkt_bytes_local_received_total

Description

Total IP packet bytes received from the local network.

Type

counter

Labels

None

gateway_ippkt_bytes_local_sent_total

Description

Total IP packet bytes sent to the local network.

Type

counter

Labels

isd_as

gateway_ippkt_bytes_received_total

Description

Total IP packet bytes received from remote gateways.

Type

counter

Labels

isd_as remote_isd_as

gateway_ippkt_bytes_sent_total

Description

Total IP packet bytes sent to remote gateways.

Type

counter

Labels

isd_as remote_isd_as domain traffic_class path_filter remote_address frame_type

gateway_ippkts_discarded_total

Description

Total number of discarded IP packets received from the local network.

Type

counter

Labels

reason

gateway_ippkts_local_received_total

Description

Total number of IP packets received from the local network.

Type

counter

Labels

None

gateway_ippkts_local_sent_total

Description

Total number of IP packets sent to the local network.

Type

counter

Labels

isd_as

gateway_ippkts_received_total

Description

Total number of IP packets received from remote gateways.

Type

counter

Labels

isd_as remote_isd_as

gateway_ippkts_sent_total

Description

Total number of IP packets sent to remote gateways.

Type

counter

Labels

isd_as remote_isd_as domain traffic_class path_filter remote_address frame_type

gateway_netlink_listener_subscribed

Description

Flag reflecting whether the netlink listener is subscribed route updates.

Type

gauge

Labels

object

gateway_netlink_listener_updates_errors_total

Description

Total number of netlink route updates errors.

Type

counter

Labels

object

gateway_path_fetch_errors_total

Description

Total number of errors fetching paths from the daemon.

Type

counter

Labels

isd_as

gateway_paths_monitored

Description

Total number of paths being monitored by the gateway.

Type

gauge

Labels

isd_as remote_isd_as

gateway_ping_reachability_changes

Description

The number of times the reachability of the gateway changed.

Type

counter

Labels

isd_as remote_isd_as remote_address interface_group

gateway_ping_reachable

Description

Whether the gateway is reachable via a specific SCION interface group.

Type

gauge

Labels

isd_as remote_isd_as remote_address interface_group

gateway_ping_received_total

Description

Total number of probe replies received from remote gateways.

Type

counter

Labels

isd_as remote_isd_as remote_address interface_group

gateway_ping_sent_total

Description

Total number of probes sent to remote gateways.

Type

counter

Labels

isd_as remote_isd_as remote_address interface_group

gateway_prefix_fetch_errors_total

Description

Total number of errors fetching prefixes via SGRP.

Type

counter

Labels

isd_as remote_isd_as remote_address

gateway_prefix_fetch_invalid_total

Description

Total number of invalid prefixes received via SGRP.

Type

gauge

Labels

isd_as remote_isd_as remote_address

gateway_prefixes_advertised

Description

Total number of IP prefixes advertised over SGRP.

Type

gauge

Labels

isd_as remote_isd_as remote_address

gateway_prefixes_fetched

Description

Total number of IP prefixes fetched via SGRP.

Type

gauge

Labels

isd_as remote_isd_as remote_address

gateway_remote_discovery_errors_total

Description

Total number of errors discovering remote gateways.

Type

counter

Labels

isd_as remote_isd_as

gateway_remote_discovery_paths_available

Description

Total number of SCION paths available to the remote gateway discovery.

Type

gauge

Labels

isd_as remote_isd_as status

gateway_remotes

Description

Total number of discovered remote gateways.

Type

gauge

Labels

isd_as remote_isd_as

gateway_remotes_changes

Description

The number of times the remotes number changed.

Type

counter

Labels

isd_as remote_isd_as

gateway_seccom_egress_sa_expiration

Description

The timestamp the current SAs expire. Measured in seconds since UNIX epoch.

Type

gauge

Labels

isd_as remote_isd_as remote_address domain traffic_class

gateway_seccom_egress_sa_last_update

Description

The timestamp the current SAs were created. Measured in seconds since UNIX epoch.

Type

gauge

Labels

isd_as remote_isd_as remote_address domain traffic_class

gateway_seccom_egress_sa_update_errors

Description

Total number of failed updates of the egress SAs.

Type

counter

Labels

isd_as remote_isd_as remote_address domain traffic_class

gateway_seccom_egress_sas

Description

Number of egress SAs that are currently configured.

Type

gauge

Labels

isd_as remote_isd_as remote_address domain traffic_class

gateway_seccom_ingress_request_errors_total

Description

Total number of errors processing incoming security communication requests.

Type

counter

Labels

isd_as remote_isd_as remote_address type reason

gateway_seccom_ingress_requests_total

Description

Total number of incoming security communication requests.

Type

counter

Labels

isd_as remote_isd_as remote_address type

gateway_seccom_ingress_sas

Description

Number of ingress SAs that are currently configured.

Type

gauge

Labels

isd_as remote_isd_as remote_address

gateway_session_is_healthy

Description

Flag reflecting session healthiness.

Type

gauge

Labels

isd_as remote_isd_as remote_address path_filter domain

gateway_session_latest_path_expiration

Description

Latest path expiration per session monitor.

Type

gauge

Labels

isd_as remote_isd_as remote_address path_filter domain

gateway_session_path_changes

Description

Number of path changes per session monitor.

Type

counter

Labels

isd_as remote_isd_as remote_address path_filter domain

gateway_session_paths_available

Description

Total number of paths available per session.

Type

gauge

Labels

isd_as remote_isd_as remote_address path_filter domain status

gateway_session_state_changes

Description

Number of state changes per session monitor.

Type

counter

Labels

isd_as remote_isd_as remote_address path_filter domain

gateway_sgrp_paths_available

Description

Total number of paths available for SGRP per remote gateway.

Type

gauge

Labels

remote_isd_as remote_address status

LAN Monitoring Metrics

mole_gateway_alive

Description

Whether the probes to the given gateway are passing through.

Type

gauge

Labels

gateway

mole_gateway_jitter_milliseconds

Description

The latency jitter to the given gateway.

Type

gauge

Labels

gateway

mole_gateway_latency_milliseconds

Description

The RTT latency to the given gateway.

Type

gauge

Labels

gateway

mole_gateway_probes_received_total

Description

Number of probes received from the given gateway.

Type

counter

Labels

gateway

mole_gateway_probes_sent_total

Description

Number of probes sent to the given gateway.

Type

counter

Labels

gateway

Appliance Cluster Metrics

appliance_controller_enforcer_license_expiry

Description

Time when the current license expires or when the current trial/grace period ends.

Type

gauge

Labels

None

nodesync_topology_fetch_errors_total

Description

The number of errors when fetching topology information from a remote node.

Type

counter

Labels

remote

nodesync_topology_merge_interface_conflicts_total

Description

The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have the same interfaces configured.

Type

counter

Labels

isd_as interface

nodesync_topology_merge_service_conflicts_total

Description

The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have services configured with the same configuration.

Type

counter

Labels

service isd_as shard

Installer Metrics

appliance_installer_checksum_consistent

Description

Whether the checksum of the installed package does match the checksum in the package signature file. This may fail if a different package with the same version number was uploaded but it hasn’t been installed.

Type

gauge

Labels

pkgtype

appliance_installer_controller_watchdog_errors_total

Description

Total number of errors encountered by the appliance controller watchdog. If this counter increases, the installer logs should be inspected for more details.

Type

counter

Labels

None

appliance_installer_installed_package_versions

Description

The version of the installed scion and system package.

Type

gauge

Labels

pkgtype version

appliance_installer_metastore_inconsistent

Description

Whether the appliance installer’s metastore is in an inconsistent state. Value is 1 if the metastore is in an inconsistent state, 0 otherwise.

Type

gauge

Labels

None

appliance_installer_rollback_installations_total

Description

Total number of rollback installations. Result label is the result of the installation.

Type

counter

Labels

result

appliance_installer_scion_installations_total

Description

Total number of scion package installations. Result label is the result of the installation.

Type

counter

Labels

result

appliance_installer_system_installations_total

Description

Total number of system package installations. Result label is the result of the installation.

Type

counter

Labels

result

BGP Metrics

BGP metrics are metrics from the BGP daemon (FRR).

frr_bgp_peer_groups_count_total

Description

Number of peer groups configured.

Type

gauge

Labels

vrf afi safi local_as

frr_bgp_peer_groups_memory_bytes

Description

Memory consumed by peer groups.

Type

gauge

Labels

vrf afi safi local_as

frr_bgp_peer_message_received_total

Description

Number of received messages.

Type

counter

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peer_message_sent_total

Description

Number of sent messages.

Type

counter

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peer_prefixes_advertised_count_total

Description

Number of prefixes advertised.

Type

gauge

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peer_prefixes_received_count_total

Description

Number of prefixes received.

Type

gauge

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peer_state

Description

State of the peer (2 = Administratively Down, 1 = Established, 0 = Down).

Type

gauge

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peer_types_up

Description

Total Number of Peer Types that are Up.

Type

gauge

Labels

type afi safi

frr_bgp_peer_uptime_seconds

Description

How long has the peer been up.

Type

gauge

Labels

vrf afi safi local_as peer peer_as

frr_bgp_peers_count_total

Description

Number peers configured.

Type

gauge

Labels

vrf afi safi local_as

frr_bgp_peers_memory_bytes

Description

Memory consumed by peers.

Type

gauge

Labels

vrf afi safi local_as

frr_bgp_rib_count_total

Description

Number of routes in the RIB.

Type

gauge

Labels

vrf afi safi local_as

frr_bgp_rib_memory_bytes

Description

Memory consumbed by the RIB.

Type

gauge

Labels

vrf afi safi local_as

Host Metrics

Host metrics are metrics from the host itself, such as CPU usage, memory consumption or network traffic on the physical network ports.

node_cpu_seconds_total

Description

Seconds the CPU spends in each mode.

Type

counter

Labels

cpu mode

node_load1

Description

1 minute load average.

Type

gauge

Labels

None

node_load5

Description

5 minute load average.

Type

gauge

Labels

None

node_load15

Description

15 minute load average.

Type

gauge

Labels

None

node_memory_MemTotal_bytes

Description

Total amount of memory in the node.

Type

gauge

Labels

None

node_memory_MemAvailable_bytes

Description

Amount of available memory in the node.

Type

gauge

Labels

None

node_filesystem_size_bytes

Description

Filesystem size in bytes.

Type

gauge

Labels

device fstype mountpoint

node_filesystem_avail_bytes

Description

Filesystem available bytes.

Type

gauge

Labels

device fstype mountpoint

node_network_receive_bytes_total

Description

Number of bytes received from the network.

Type

counter

Labels

device

node_network_transmit_bytes_total

Description

Number of bytes transmitted to the network.

Type

counter

Labels

device