Telemetry¶
Overview¶
Each Anapaya appliance exposes a telemetry endpoint that can be used to retrieve telemetry data from the appliance.
Tip
To enable telemetry of the appliance, the telemetry endpoint needs to be configured in the Management section of the appliance configuration.
The telemetry data is exported in the form of Prometheus metrics. Prometheus is an open-source systems monitoring and alerting tool. It collects and stores metrics as time series data alongside optional key-value pairs called labels. A metric is a numeric measurement of a specific event or condition, e.g., the number of packets sent on a specific interface. Recording metrics in time series provides then higher-level insights such as the rate of change of the sent packet counter to calculate the throughput of the interface. Labels add additional dimensions to a metric, e.g., the name of the interface for which the packet count is collected is added as a label.
Each Anapaya appliance internally has several modules that expose some of their internal states as metrics. Each module manages a particular part of the system, such as the SCION control plane, the SCION data plane, or the IP-in-SCION tunneling service. For each module, we list the exposed metrics, their names, the type of the metric, a brief description, and the attached labels. Please refer to the individual sections below for more information.
To access these metrics, a Prometheus server is needed that ingests the metrics from each appliance. How to set up a Prometheus server to collect appliance metrics is outside the scope of this document. Please refer to the Prometheus Getting Started guide for more information. Should you require assistance with integrating appliance metrics in your monitoring setup, please contact Anapaya’s customer support team (customer-support@anapaya.net).
Control Plane Metrics¶
control_beaconing_originated_beacons_total
Description
Total number of beacons originated.
Type
counter
Labels
egress_interface
result
control_beaconing_propagated_beacons_total
Description
Total number of beacons propagated.
Type
counter
Labels
start_isd_as
ingress_interface
egress_interface
result
control_beaconing_received_beacons_total
Description
Total number of beacons received.
Type
counter
Labels
ingress_interface
neighbor_isd_as
result
control_beaconing_registered_segments_total
Description
Total number of segments registered.
Type
counter
Labels
start_isd_as
ingress_interface
seg_type
result
control_segment_expiration_deficient
Description
Indicates whether the expiration time of the segment is below the configured maximum. This happens when the signer expiration time is lower than the maximum segment expiration time.
Type
gauge
Labels
None
control_segment_lookup_requests_total
Description
Total number of path segments requests received.
Type
counter
Labels
dst_isd
seg_type
result
control_segment_registry_segments_received_total
Description
Total number of path segments received through registrations.
Type
counter
Labels
src
seg_type
result
renewal_ca_health_status
Description
Exposes the status of the CA (available, unavailable, starting, stopping), if the host acts as CA and is delegating certificate renewal to the CA service.
Type
gauge
Labels
status
renewal_handled_requests_total
Description
Total number of renewal requests served by each handler type (legacy, in-process, delegating).
Type
counter
Labels
result
type
renewal_received_requests_total
Description
Total number of renewal requests served.
Type
counter
Labels
result
renewal_registered_handlers
Description
Exposes which handler type (legacy, in-process, delegating) is registered.
Type
gauge
Labels
type
trustengine_latest_trc_not_after_time_seconds
Description
The not_after time of the latest TRC for the local ISD in seconds since UNIX epoch.
Type
gauge
Labels
None
trustengine_latest_trc_not_before_time_seconds
Description
The not_before time of the latest TRC for the local ISD in seconds since UNIX epoch.
Type
gauge
Labels
None
trustengine_latest_trc_serial_number
Description
The serial number of the latest TRC for the local ISD.
Type
gauge
Labels
None
Data Plane Metrics¶
router_dropped_pkts_total
Description
Total number of packets dropped.
Type
counter
Labels
interface
isd_as
neighbor_isd_as
type
router_input_bytes_total
Description
Total number of bytes received
Type
counter
Labels
interface
isd_as
neighbor_isd_as
router_input_pkts_total
Description
Total number of packets received
Type
counter
Labels
interface
isd_as
neighbor_isd_as
router_interface_up
Description
1 indicates the interface is up, 0 otherwise.
Type
gauge
Labels
interface
isd_as
neighbor_isd_as
router_output_bytes_total
Description
Total number of bytes sent.
Type
counter
Labels
interface
isd_as
neighbor_isd_as
router_output_pkts_total
Description
Total number of packets sent.
Type
counter
Labels
interface
isd_as
neighbor_isd_as
dataplane_control_dataplane_sync_error
Description
Indicates whether the last dataplane sync had an error (1) or not (0).
Type
gauge
Labels
None
IP-in-SCION Tunneling Metrics¶
gateway_flow_exporter_cleanup_run_time
Description
Overall time the flow clean up has been running, in seconds.
Type
gauge
Labels
None
gateway_flow_exporter_export_run_time
Description
Overall time the flow exporting has been running, in seconds.
Type
gauge
Labels
None
gateway_flow_exporter_last_cleanup_time
Description
The timestamp up until which the finished flows were deleted. Seconds since UNIX epoch.
Type
gauge
Labels
None
gateway_flow_exporter_last_export_time
Description
The timestamp of the last time when the flow metrics were exported, successfully. Measured in seconds since UNIX epoch.
Type
gauge
Labels
None
gateway_flow_exporter_lost
Description
The cumulative duration of time (in seconds) for which there has been flow data lost by the flow exporter.
Type
counter
Labels
None
gateway_ippkt_bytes_local_received_total
Description
Total IP packet bytes received from the local network.
Type
counter
Labels
None
gateway_ippkt_bytes_local_sent_total
Description
Total IP packet bytes sent to the local network.
Type
counter
Labels
isd_as
gateway_ippkt_bytes_received_total
Description
Total IP packet bytes received from remote gateways.
Type
counter
Labels
isd_as
remote_isd_as
gateway_ippkt_bytes_sent_total
Description
Total IP packet bytes sent to remote gateways.
Type
counter
Labels
isd_as
remote_isd_as
domain
traffic_class
path_filter
remote_address
gateway_ippkts_discarded_total
Description
Total number of discarded IP packets received from the local network.
Type
counter
Labels
reason
gateway_ippkts_local_received_total
Description
Total number of IP packets received from the local network.
Type
counter
Labels
None
gateway_ippkts_local_sent_total
Description
Total number of IP packets sent to the local network.
Type
counter
Labels
isd_as
gateway_ippkts_received_total
Description
Total number of IP packets received from remote gateways.
Type
counter
Labels
isd_as
remote_isd_as
gateway_ippkts_sent_total
Description
Total number of IP packets sent to remote gateways.
Type
counter
Labels
isd_as
remote_isd_as
domain
traffic_class
path_filter
remote_address
gateway_netlink_listener_subscribed
Description
Flag reflecting whether the netlink listener is subscribed route updates.
Type
gauge
Labels
None
gateway_netlink_listener_updates_errors_total
Description
Total number of netlink route updates errors.
Type
counter
Labels
None
gateway_paths_monitored
Description
Total number of paths being monitored by the gateway.
Type
gauge
Labels
isd_as
remote_isd_as
gateway_prefix_fetch_errors_total
Description
Total number of errors fetching prefixes via SGRP.
Type
counter
Labels
isd_as
remote_isd_as
remote_address
gateway_prefix_fetch_invalid_total
Description
Total number of invalid prefixes received via SGRP.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
gateway_prefixes_advertised
Description
Total number of IP prefixes advertised over SGRP.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
gateway_prefixes_fetched
Description
Total number of IP prefixes fetched via SGRP.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
gateway_remote_discovery_errors_total
Description
Total number of errors discovering remote gateways.
Type
counter
Labels
isd_as
remote_isd_as
gateway_remote_discovery_paths_available
Description
Total number of SCION paths available to the remote gateway discovery.
Type
gauge
Labels
isd_as
remote_isd_as
status
gateway_remotes
Description
Total number of discovered remote gateways.
Type
gauge
Labels
isd_as
remote_isd_as
gateway_remotes_changes
Description
The number of times the remotes number changed.
Type
counter
Labels
isd_as
remote_isd_as
gateway_session_is_healthy
Description
Flag reflecting session healthiness.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
path_filter
domain
gateway_session_latest_path_expiration
Description
Latest path expiration per session monitor.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
path_filter
domain
gateway_session_path_changes
Description
Number of path changes per session monitor.
Type
counter
Labels
isd_as
remote_isd_as
remote_address
path_filter
domain
gateway_session_paths_available
Description
Total number of paths available per session.
Type
gauge
Labels
isd_as
remote_isd_as
remote_address
path_filter
domain
status
gateway_session_state_changes
Description
Number of state changes per session monitor.
Type
counter
Labels
isd_as
remote_isd_as
remote_address
path_filter
domain
gateway_sgrp_paths_available
Description
Total number of paths available for SGRP per remote gateway.
Type
gauge
Labels
remote_isd_as
remote_address
status
gateway_state_reconfiguration_duration_seconds_total
Description
Overall Duration of all the state reconfigurations. Component label can be either ‘planning’, ‘dataplane’ or ‘controlplane’.
Type
counter
Labels
component
Appliance Cluster Metrics¶
appliance_controller_enforcer_license_expiry
Description
Time when the current license expires or when the current trial/grace period ends.
Type
gauge
Labels
None
nodesync_topology_fetch_errors_total
Description
The number of errors when fetching topology information from a remote node.
Type
counter
Labels
remote
nodesync_topology_merge_interface_conflicts_total
Description
The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have the same interfaces configured.
Type
counter
Labels
isd_as
interface
nodesync_topology_merge_service_conflicts_total
Description
The number of topology merge conflicts. This indicates a severe misconfiguration of appliances. It means that multiple appliances have services configured with the same configuration.
Type
counter
Labels
service
isd_as
shard
Installer Metrics¶
appliance_installer_checksum_consistent
Description
Whether the checksum of the installed package does match the checksum in the package signature file. This may fail if a different package with the same version number was uploaded but it hasn’t been installed.
Type
gauge
Labels
pkgtype
appliance_installer_controller_watchdog_errors_total
Description
Total number of errors encountered by the appliance controller watchdog. If this counter increases, the installer logs should be inspected for more details.
Type
counter
Labels
None
appliance_installer_installed_package_versions
Description
The version of the installed scion and system package.
Type
gauge
Labels
pkgtype
version
appliance_installer_metastore_inconsistent
Description
Whether the appliance installer’s metastore is in an inconsistent state. Value is 1 if the metastore is in an inconsistent state, 0 otherwise.
Type
gauge
Labels
None
appliance_installer_rollback_installations_total
Description
Total number of rollback installations. Result label is the result of the installation.
Type
counter
Labels
result
appliance_installer_scion_installations_total
Description
Total number of scion package installations. Result label is the result of the installation.
Type
counter
Labels
result
appliance_installer_system_installations_total
Description
Total number of system package installations. Result label is the result of the installation.
Type
counter
Labels
result
BGP Metrics¶
BGP metrics are metrics from the BGP daemon (FRR).
frr_bgp_peer_groups_count_total
Description
Number of peer groups configured.
Type
gauge
Labels
vrf
afi
safi
local_as
frr_bgp_peer_groups_memory_bytes
Description
Memory consumed by peer groups.
Type
gauge
Labels
vrf
afi
safi
local_as
frr_bgp_peer_message_received_total
Description
Number of received messages.
Type
counter
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peer_message_sent_total
Description
Number of sent messages.
Type
counter
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peer_prefixes_advertised_count_total
Description
Number of prefixes advertised.
Type
gauge
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peer_prefixes_received_count_total
Description
Number of prefixes received.
Type
gauge
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peer_state
Description
State of the peer (2 = Administratively Down, 1 = Established, 0 = Down).
Type
gauge
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peer_types_up
Description
Total Number of Peer Types that are Up.
Type
gauge
Labels
type
afi
safi
frr_bgp_peer_uptime_seconds
Description
How long has the peer been up.
Type
gauge
Labels
vrf
afi
safi
local_as
peer
peer_as
frr_bgp_peers_count_total
Description
Number peers configured.
Type
gauge
Labels
vrf
afi
safi
local_as
frr_bgp_peers_memory_bytes
Description
Memory consumed by peers.
Type
gauge
Labels
vrf
afi
safi
local_as
frr_bgp_rib_count_total
Description
Number of routes in the RIB.
Type
gauge
Labels
vrf
afi
safi
local_as
frr_bgp_rib_memory_bytes
Description
Memory consumbed by the RIB.
Type
gauge
Labels
vrf
afi
safi
local_as
Host Metrics¶
Host metrics are metrics from the host itself, such as CPU usage, memory consumption or network traffic on the physical network ports.
node_cpu_seconds_total
Description
Seconds the CPU spends in each mode.
Type
counter
Labels
cpu
mode
node_load1
Description
1 minute load average.
Type
gauge
Labels
None
node_load5
Description
5 minute load average.
Type
gauge
Labels
None
node_load15
Description
15 minute load average.
Type
gauge
Labels
None
node_memory_MemTotal_bytes
Description
Total amount of memory in the node.
Type
gauge
Labels
None
node_memory_MemAvailable_bytes
Description
Amount of available memory in the node.
Type
gauge
Labels
None
node_filesystem_size_bytes
Description
Filesystem size in bytes.
Type
gauge
Labels
device
fstype
mountpoint
node_filesystem_avail_bytes
Description
Filesystem available bytes.
Type
gauge
Labels
device
fstype
mountpoint
node_network_receive_bytes_total
Description
Number of bytes received from the network.
Type
counter
Labels
device
node_network_transmit_bytes_total
Description
Number of bytes transmitted to the network.
Type
counter
Labels
device