Telemetry¶
Overview¶
Each Anapaya appliance exposes a telemetry endpoint that can be used to retrieve telemetry data from the appliance.
Tip
To enable telemetry of the appliance, the telemetry endpoint needs to be configured in the Management section of the appliance configuration.
The telemetry data is exported in the form of Prometheus metrics. Prometheus is an open-source systems monitoring and alerting tool. It collects and stores metrics as time series data alongside optional key-value pairs called labels. A metric is a numeric measurement of a specific event or condition, e.g., the number of packets sent on a specific interface. Recording metrics in time series provides then higher-level insights such as the rate of change of the sent packet counter to calculate the throughput of the interface. Labels add additional dimensions to a metric, e.g., the name of the interface for which the packet count is collected is added as a label.
Each Anapaya appliance internally has several modules that expose some of their internal states as metrics. Each module manages a particular part of the system, such as the SCION control plane, the SCION data plane, or the IP-in-SCION tunneling service. For each module, we list the exposed metrics, their names, the type of the metric, a brief description, and the attached labels. Please refer to the individual sections below for more information.
To access these metrics, a Prometheus server is needed that ingests the metrics from each appliance. How to set up a Prometheus server to collect appliance metrics is outside the scope of this document. Please refer to the Prometheus Getting Started guide for more information. Should you require assistance with integrating appliance metrics in your monitoring setup, please contact Anapaya’s customer support team (customer-support@anapaya.net).
Control Plane Metrics¶
control_beaconing_originated_beacons_total
Description
Total number of beacons originated.
Type
counter
Labels
egress_interfaceresult
control_beaconing_propagated_beacons_total
Description
Total number of beacons propagated.
Type
counter
Labels
start_isd_asingress_interfaceegress_interfaceresult
control_beaconing_received_beacons_total
Description
Total number of beacons received.
Type
counter
Labels
ingress_interfaceneighbor_isd_asresult
control_beaconing_registered_segments_total
Description
Total number of segments registered.
Type
counter
Labels
start_isd_asingress_interfaceseg_typeresult
control_segment_expiration_deficient
Description
Indicates whether the expiration time of the segment is below the configured maximum. This happens when the signer expiration time is lower than the maximum segment expiration time.
Type
gauge
Labels
None
control_segment_lookup_requests_total
Description
Total number of path segments requests received.
Type
counter
Labels
dst_isdseg_typeresult
control_segment_registry_segments_received_total
Description
Total number of path segments received through registrations.
Type
counter
Labels
srcseg_typeresult
renewal_ca_health_status
Description
Exposes the status of the CA (available, unavailable, starting, stopping), if the host acts as CA and is delegating certificate renewal to the CA service.
Type
gauge
Labels
status
renewal_handled_requests_total
Description
Total number of renewal requests served by each handler type (legacy, in-process, delegating).
Type
counter
Labels
resulttype
renewal_received_requests_total
Description
Total number of renewal requests served.
Type
counter
Labels
result
renewal_registered_handlers
Description
Exposes which handler type (legacy, in-process, delegating) is registered.
Type
gauge
Labels
type
trustengine_latest_trc_not_after_time_seconds
Description
The not_after time of the latest TRC for the local ISD in seconds since UNIX epoch.
Type
gauge
Labels
None
trustengine_latest_trc_not_before_time_seconds
Description
The not_before time of the latest TRC for the local ISD in seconds since UNIX epoch.
Type
gauge
Labels
None
trustengine_latest_trc_serial_number
Description
The serial number of the latest TRC for the local ISD.
Type
gauge
Labels
None
Data Plane Metrics¶
router_dropped_pkts_total
Description
Total number of packets dropped.
Type
counter
Labels
interfaceisd_asneighbor_isd_astype
router_input_bytes_total
Description
Total number of bytes received
Type
counter
Labels
interfaceisd_asneighbor_isd_as
router_input_pkts_total
Description
Total number of packets received
Type
counter
Labels
interfaceisd_asneighbor_isd_as
router_interface_up
Description
1 indicates the interface is up, 0 otherwise.
Type
gauge
Labels
interfaceisd_asneighbor_isd_as
router_output_bytes_total
Description
Total number of bytes sent.
Type
counter
Labels
interfaceisd_asneighbor_isd_as
router_output_pkts_total
Description
Total number of packets sent.
Type
counter
Labels
interfaceisd_asneighbor_isd_as
dataplane_control_dataplane_sync_error
Description
Indicates whether the last dataplane sync had an error (1) or not (0).
Type
gauge
Labels
None
IP-in-SCION Tunneling Metrics¶
gateway_flow_exporter_cleanup_run_time
Description
Overall time the flow clean up has been running, in seconds.
Type
gauge
Labels
None
gateway_flow_exporter_export_run_time
Description
Overall time the flow exporting has been running, in seconds.
Type
gauge
Labels
None
gateway_flow_exporter_last_cleanup_time
Description
The timestamp up until which the finished flows were deleted. Seconds since UNIX epoch.
Type
gauge
Labels
None
gateway_flow_exporter_last_export_time
Description
The timestamp of the last time when the flow metrics were exported, successfully. Measured in seconds since UNIX epoch.
Type
gauge
Labels
None
gateway_flow_exporter_lost
Description
The cumulative duration of time (in seconds) for which there has been flow data lost by the flow exporter.
Type
counter
Labels
None
gateway_ippkt_bytes_local_received_total
Description
Total IP packet bytes received from the local network.
Type
counter
Labels
None
gateway_ippkt_bytes_local_sent_total
Description
Total IP packet bytes sent to the local network.
Type
counter
Labels
isd_as
gateway_ippkt_bytes_received_total
Description
Total IP packet bytes received from remote gateways.
Type
counter
Labels
isd_asremote_isd_as
gateway_ippkt_bytes_sent_total
Description
Total IP packet bytes sent to remote gateways.
Type
counter
Labels
isd_asremote_isd_asdomaintraffic_classpath_filterremote_address
gateway_ippkts_discarded_total
Description
Total number of discarded IP packets received from the local network.
Type
counter
Labels
reason
gateway_ippkts_local_received_total
Description
Total number of IP packets received from the local network.
Type
counter
Labels
None
gateway_ippkts_local_sent_total
Description
Total number of IP packets sent to the local network.
Type
counter
Labels
isd_as
gateway_ippkts_received_total
Description
Total number of IP packets received from remote gateways.
Type
counter
Labels
isd_asremote_isd_as
gateway_ippkts_sent_total
Description
Total number of IP packets sent to remote gateways.
Type
counter
Labels
isd_asremote_isd_asdomaintraffic_classpath_filterremote_address
gateway_netlink_listener_subscribed
Description
Flag reflecting whether the netlink listener is subscribed route updates.
Type
gauge
Labels
object
gateway_netlink_listener_updates_errors_total
Description
Total number of netlink route updates errors.
Type
counter
Labels
object
gateway_paths_monitored
Description
Total number of paths being monitored by the gateway.
Type
gauge
Labels
isd_asremote_isd_as
gateway_prefix_fetch_errors_total
Description
Total number of errors fetching prefixes via SGRP.
Type
counter
Labels
isd_asremote_isd_asremote_address
gateway_prefix_fetch_invalid_total
Description
Total number of invalid prefixes received via SGRP.
Type
gauge
Labels
isd_asremote_isd_asremote_address
gateway_prefixes_advertised
Description
Total number of IP prefixes advertised over SGRP.
Type
gauge
Labels
isd_asremote_isd_asremote_address
gateway_prefixes_fetched
Description
Total number of IP prefixes fetched via SGRP.
Type
gauge
Labels
isd_asremote_isd_asremote_address
gateway_remote_discovery_errors_total
Description
Total number of errors discovering remote gateways.
Type
counter
Labels
isd_asremote_isd_as
gateway_remote_discovery_paths_available
Description
Total number of SCION paths available to the remote gateway discovery.
Type
gauge
Labels
isd_asremote_isd_asstatus
gateway_remotes
Description
Total number of discovered remote gateways.
Type
gauge
Labels
isd_asremote_isd_as
gateway_remotes_changes
Description
The number of times the remotes number changed.
Type
counter
Labels
isd_asremote_isd_as
gateway_session_is_healthy
Description
Flag reflecting session healthiness.
Type
gauge
Labels
isd_asremote_isd_asremote_addresspath_filterdomain
gateway_session_latest_path_expiration
Description
Latest path expiration per session monitor.
Type
gauge
Labels
isd_asremote_isd_asremote_addresspath_filterdomain
gateway_session_path_changes
Description
Number of path changes per session monitor.
Type
counter
Labels
isd_asremote_isd_asremote_addresspath_filterdomain
gateway_session_paths_available
Description
Total number of paths available per session.
Type
gauge
Labels
isd_asremote_isd_asremote_addresspath_filterdomainstatus
gateway_session_state_changes
Description
Number of state changes per session monitor.
Type
counter
Labels
isd_asremote_isd_asremote_addresspath_filterdomain
gateway_sgrp_paths_available
Description
Total number of paths available for SGRP per remote gateway.
Type
gauge
Labels
remote_isd_asremote_addressstatus
LAN Monitoring Metrics¶
mole_gateway_alive
Description
Whether the probes to the given gateway are passing through.
Type
gauge
Labels
gateway
mole_gateway_jitter_milliseconds
Description
The latency jitter to the given gateway.
Type
gauge
Labels
gateway
mole_gateway_latency_milliseconds
Description
The RTT latency to the given gateway.
Type
gauge
Labels
gateway
mole_gateway_probes_received_total
Description
Number of probes received from the given gateway.
Type
counter
Labels
gateway
mole_gateway_probes_sent_total
Description
Number of probes sent to the given gateway.
Type
counter
Labels
gateway
Appliance Cluster Metrics¶
appliance_controller_enforcer_license_expiry
Description
Time when the current license expires or when the current trial/grace period ends.
Type
gauge
Labels
None
nodesync_topology_fetch_errors_total
Description
The number of errors when fetching topology information from a remote node.
Type
counter
Labels
remote
nodesync_topology_merge_interface_conflicts_total
Description
The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have the same interfaces configured.
Type
counter
Labels
isd_asinterface
nodesync_topology_merge_service_conflicts_total
Description
The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have services configured with the same configuration.
Type
counter
Labels
serviceisd_asshard
Installer Metrics¶
appliance_installer_checksum_consistent
Description
Whether the checksum of the installed package does match the checksum in the package signature file. This may fail if a different package with the same version number was uploaded but it hasn’t been installed.
Type
gauge
Labels
pkgtype
appliance_installer_controller_watchdog_errors_total
Description
Total number of errors encountered by the appliance controller watchdog. If this counter increases, the installer logs should be inspected for more details.
Type
counter
Labels
None
appliance_installer_installed_package_versions
Description
The version of the installed scion and system package.
Type
gauge
Labels
pkgtypeversion
appliance_installer_metastore_inconsistent
Description
Whether the appliance installer’s metastore is in an inconsistent state. Value is 1 if the metastore is in an inconsistent state, 0 otherwise.
Type
gauge
Labels
None
appliance_installer_rollback_installations_total
Description
Total number of rollback installations. Result label is the result of the installation.
Type
counter
Labels
result
appliance_installer_scion_installations_total
Description
Total number of scion package installations. Result label is the result of the installation.
Type
counter
Labels
result
appliance_installer_system_installations_total
Description
Total number of system package installations. Result label is the result of the installation.
Type
counter
Labels
result
BGP Metrics¶
BGP metrics are metrics from the BGP daemon (FRR).
frr_bgp_peer_groups_count_total
Description
Number of peer groups configured.
Type
gauge
Labels
vrfafisafilocal_as
frr_bgp_peer_groups_memory_bytes
Description
Memory consumed by peer groups.
Type
gauge
Labels
vrfafisafilocal_as
frr_bgp_peer_message_received_total
Description
Number of received messages.
Type
counter
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peer_message_sent_total
Description
Number of sent messages.
Type
counter
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peer_prefixes_advertised_count_total
Description
Number of prefixes advertised.
Type
gauge
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peer_prefixes_received_count_total
Description
Number of prefixes received.
Type
gauge
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peer_state
Description
State of the peer (2 = Administratively Down, 1 = Established, 0 = Down).
Type
gauge
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peer_types_up
Description
Total Number of Peer Types that are Up.
Type
gauge
Labels
typeafisafi
frr_bgp_peer_uptime_seconds
Description
How long has the peer been up.
Type
gauge
Labels
vrfafisafilocal_aspeerpeer_as
frr_bgp_peers_count_total
Description
Number peers configured.
Type
gauge
Labels
vrfafisafilocal_as
frr_bgp_peers_memory_bytes
Description
Memory consumed by peers.
Type
gauge
Labels
vrfafisafilocal_as
frr_bgp_rib_count_total
Description
Number of routes in the RIB.
Type
gauge
Labels
vrfafisafilocal_as
frr_bgp_rib_memory_bytes
Description
Memory consumbed by the RIB.
Type
gauge
Labels
vrfafisafilocal_as
Host Metrics¶
Host metrics are metrics from the host itself, such as CPU usage, memory consumption or network traffic on the physical network ports.
node_cpu_seconds_total
Description
Seconds the CPU spends in each mode.
Type
counter
Labels
cpumode
node_load1
Description
1 minute load average.
Type
gauge
Labels
None
node_load5
Description
5 minute load average.
Type
gauge
Labels
None
node_load15
Description
15 minute load average.
Type
gauge
Labels
None
node_memory_MemTotal_bytes
Description
Total amount of memory in the node.
Type
gauge
Labels
None
node_memory_MemAvailable_bytes
Description
Amount of available memory in the node.
Type
gauge
Labels
None
node_filesystem_size_bytes
Description
Filesystem size in bytes.
Type
gauge
Labels
devicefstypemountpoint
node_filesystem_avail_bytes
Description
Filesystem available bytes.
Type
gauge
Labels
devicefstypemountpoint
node_network_receive_bytes_total
Description
Number of bytes received from the network.
Type
counter
Labels
device
node_network_transmit_bytes_total
Description
Number of bytes transmitted to the network.
Type
counter
Labels
device