Appliance Release v0.40

This page contains the release notes for the v0.40 Anapaya appliance software release. The appliance software release is applicable for the following Anapaya products:

  • Anapaya CORE

  • Anapaya EDGE

  • Anapaya GATE

We recommend always upgrading to the latest available patch release. Please refer to Upgrade Notes (if any) of each release if there are any special steps to be taken when upgrading. For general information on how to upgrade your appliance, please refer to Appliance Update Guide.

Observability

This software release also contains updated Grafana dashboards and alerting rules, please follow Recommended Alert Rules and Recommended Grafana Dashboards for more information how to update your monitoring setup.

Upgrade Notes

Note

This release requires the installation of the anapaya-system package v2.17.0. The installer will automatically reject the installation if the required anapaya-system package is not available.

Warning

The new appliance configuration management API endpoints allow the API users to retrieve older configurations prior to release v0.39, which introduced the new secret management. In case you do not want users with reader, observer, or writer roles to be able to retrieve the older configurations that contain secrets, you should delete the older configurations.

You can do so, by running the following command:

rm -r /etc/anapaya/appliance/configs/v0_3{4,5,6,7,8}

Warning

Release v0.40.0 introduces potentially breaking changes for deployments with very strict firewall rules. Please read the release notes carefully before upgrading. If you are unsure whether the changes will affect your deployment, please contact the Anapaya Support.

v0.40.0 (2025-08-14)

Features

Generic Routing Encapsulation (GRE) support

The appliance now supports GRE tunnels which can be defined in the interfaces section of the appliance configuration.

The GRE configuration accepts common interface properties, e.g., mtu, addresses. Additionally, the source and destination fields are mandatory, and they denote the tunnel endpoints. The source must be an address already configured on another interface.

At this point, we only support L3 point-to-point tunnels. If you require L2 tunnels, please contact the Anapaya Support.

An example configuration for a GRE tunnel looks as follows:

{
	"interfaces": {
		"gres": [
			{
				"name": "gre0",
				"source": "172.20.0.1",
				"destination": "172.21.0.1",
				"addresses": ["192.168.0.10/24"]
			}
		]
	}
}

Shuttle: an experimental SCION-based tunnel

Important

Shuttle is an experimental feature. Make sure that you have a fallback for access to your appliance, if you intend to use it as your primary management access.

The appliance now supports an additional experimental SCION based VPN called shuttle. The shuttle protocol is based on connect-ip and runs on top of HTTP3-over-SCION. This allows the user to set up an in-band tunnel over SCION that can be used for management access. Shuttle follows a client-server model, the client can be configured in the appliance configuration.

The shuttle client is configured as an interface in the appliance configuration under /interfaces/shuttles/{name}. Detailed configuration information can be found in the :ref:Management API specification <core_appliance_mgmt_api>.

An example configuration for the shuttle client looks as follows:

{
  "name": "shut0",
  "local": "172.20.2.3",
  "servers": [{"endpoint": "1-ff00:0:110,172.20.1.3:52810"}]
}

In case you want to host your own shuttle server, please contact the Anapaya Support.

Improvements

Appliance secrets management

Several improvements to the appliance secret management are introduced to reduce friction that our users have reported after adopting secret management in release v0.39.

The appliance now no longer requires sequential versioning of the secrets. You can push a secret {secret-id}@{version} without {secret-id}@{version-1} being present. This will allow you to push secrets with the same version across multiple appliances without having to fill in the gaps, for example, because one appliance has recently been added and does not contain the previously rotated out secrets.

The appliance now supports an additional DELETE /v1/secrets/{secret-id} endpoint which you can use to delete unreferenced secrets. The equivalent CLI command is:

appliance-cli delete secrets/{secret-id}@{version}

Appliance configuration management

The Appliance API now supports additional endpoints to manage configurations. We now allow listing previous configurations, and calculating diffs between configurations. Furthermore, we allow retrieving configurations that applied to previous releases of the appliance.

The following endpoints are now available to manage configurations of the current release:

  • GET /api/v1/configs list all configurations (without including the actual configuration)

  • GET /api/v1/configs?include_config=true list all configurations (including the actual configuration)

  • GET /api/v1/configs/latest get the current configuration (same as existing GET /api/v1/config)

  • GET /api/v1/configs/{config-id} retrieve a specific configuration.

  • POST /api/v1/configs create a new configuration (same as existing PUT /api/v1/config)

  • POST /api/v1/configs/{config-id}/reapply reapply a specific configuration (with a new ID).

  • POST /api/v1/configs/{config-id}/diff calculate diff to predecessor configuration.

  • POST /api/v1/configs/{config-id}/diff/{reference-config-id} calculate diff to another configuration.

The corresponding CLI commands are:

  • appliance-cli get configs (add --query include_config=true to include configurations)

  • appliance-cli get configs/latest

  • appliance-cli get configs/{config-id}

  • appliance-cli post configs < appliance.json

  • appliance-cli post configs/{config-id}/reapply

  • appliance-cli post configs/{config-id}/diff

  • appliance-cli post configs/{config-id}/diff/{reference-config-id}

Tip

Use --query include_config=true to include the actual configuration when listing configurations. By default, the configurations are omitted to improve readability of the output.

Tip

The diff endpoint returns a JSON object with metadata and the actual diff. To output only the diff itself, you can use the --raw and --filter flags:

appliance-cli post configs/latest/diff --raw --filter body.diff

The following endpoints are now available to manage configurations of previous releases:

  • GET /api/v1/configs/releases list all previous releases

  • GET /api/v1/configs/releases/{release-id} to retrieve a specific release.

  • GET /api/v1/configs/releases/{release-id}/{config-id} to retrieve a specific configuration for a specific release.

Detailed information on the additional endpoints can be found in the :ref:Management API specification <core_appliance_mgmt_api>.

Working with JSON configurations can be verbose when editing the configuration manually. To improve the user experience, the appliance can now also handle YAML configurations being pushed on the following endpoints:

  • PUT /api/v1/config

  • POST /api/v1/config/validate

  • POST /api/v1/config/diff

  • POST /api/v1/configs

While the request body can be in YAML format, the response will always be in JSON. When using the appliance CLI, you can use the --format yaml flag to output the configuration in YAML.

Auto-renewal fallback

The appliance now automatically falls back to contacting the certificate authorities listed in the TRC, if auto-renewal fails with the explicitly set issuers in the configuration. The fallback issuers are tried in the order in which their certificate appear in the TRC.

To disable this behavior, set /scion/ases*/cppki/disable_issuer_fallback to true.

Other improvements:

  • The appliance-cli now returns an error if the HTTP request to the appliance API fails. Previously, you had to provide the --fail flag to enable this behavior. The --fail flag is replaced by the --no-fail flag, which can be used to disable this behavior if needed.

  • The appliance now supports a custom number of RX queues on an interface. The default value 0 means that the appliance sets the appropriate number based on the number of workers. Use /interfaces/ethernets/*/vpp/num_rx_queues to configure a custom number of RX queues.

  • The POST /api/v1/cppki/certificates and POST /api/v1/cppki/trcs endpoints now accept PEM encoded certificate chains and TRC bundles that contain trailing whitespace. This allows users to submit certificates and TRCs that may have been copied from sources that include such formatting, without causing errors. Whitespace between PEM blocks are not supported and will still result in an error.

  • The POST /api/v1/cppki/certificates now ignores root certificates included in the PEM encoded certificate chain. This allows users to submit certificate chains that may include root certificates without causing errors. We still recommend that users only submit the AS and CA certificates without the root certificate, as this is the canonical way to keep certificate chains.

  • The appliance now does the following additional configuration validation checks:

    • Check no API listener in /management/api/listeners is listening on port 22 (SSH).

    • Validate that payload encryption is enabled on the endpoint if any domain is configured with encryption. Previously, only the default domain was checked.

    • No longer require API listener /management/api/listeners as long as the UNIX socket is not disabled.

  • The default VPP connection timers are relaxed:

    • /system/vpp/connection/health_check/threshold is now 20 by default (increased from 3).

    • /system/vpp/connection/reconnect_attempts is now 10 by default (increased from 5).

    This should make startup of VPP and services connecting to it more robust, especially in larger appliances like the L and XL types.

  • The appliance now supports setting the tenant ID for Loki in the configuration. You can set the tenant ID under /management/telemetry/logging/loki/tenant_id.

  • The appliance now supports disabling an an IP-in-SCION tunneling domain without removing it from the configuration using the new field /scion_tunneling/domains/*/disabled.

  • The appliance health endpoint now includes a check for time synchronization when NTP is configured.

  • The SCION control service now ensures that the TRCs are loaded from disk even if no certificate for the local AS is present. Previously, the TRCs were only loaded on startup, or when a signer was successfully created, which required a certificate for the local AS to be present. This could lead to a failing health check with check_id: 4001-1001 even though the TRCs were present on the appliance.

  • The new HTTP endpoint debug/scion-tunneling/discovery/announcements retrieves the discovery announcements published by the appliance. The result is a map from the local ISD-AS number to the actual announcement as it is passed over the network.

    $ appliance-cli get debug/scion-tunneling/discovery/announcements
    {
      "1-ff00:0:110": {
        "gateways": [
          {
            "allow_interfaces": [1, 2, 3],
            "control_address": "172.20.3.2:40201",
            "data_address": "172.20.3.2:40200",
            "probe_address": "172.20.3.2:40202"
          }
        ]
      }
    }
    
  • The new HTTP endpoint debug/scion-tunneling/sgrp/announcements retrieves the SGRP announcements published by the appliance to a particular remote AS. The result is a map from the local ISD-AS number to the actual announcement as it is passed over the network.

    $ appliance-cli get debug/scion-tunneling/sgrp/announcements
    {
      announcements: [
        {
          local_isd_as: "1-ff00:0:110"
          remotes: [
            {
              prefixes: ["172.20.3.0/24"]
              remote_isd_as: "1-ff00:0:111"
            }
            {
              prefixes: ["172.20.3.0/24"]
              remote_isd_as: "1-ff00:0:112"
            }
          ]
        }
      ]
    }
    
  • The appliance now adds an additional route label on the following metrics:

    • appliance_http_request_total

    • appliance_http_request_latency_seconds

    • appliance_cron_http_request_total

    • appliance_cron_http_request_latency_seconds

    • appliance_installer_http_request_total

    • appliance_installer_http_request_latency_seconds

  • The metric gateway_next_hop_reachable tracks the state of the static announcements with next-hop tracking. Label address contains the tracked IP address and the value is 1 if the next-hop is reachable, 0 otherwise.

    The new alert TunnelingNextHopNotReachable consumes this metric and fires if the next hop becomes unreachable.

  • The frame discarded reason TOO_OLD is removed from the gateway_frames_discarded_total metric and some new counters are added instead. Below is a description of the frames discarded reasons:

    • INVALID: there was a validation/check error in the frame,. e.g. truncated packet, wrong version, etc.

    • FRAGMENTS_EVICTED: fragments (partial IP packets) dropped.

    • SEQ_TOO_HIGH: the seq_num of the received frame is higher than the highest expected seq_num in the istream.

    • SEQ_TOO_LOW: the seq_num of the received frame is lower than the highest seq_num seen in the istream.

    • DUPLICATE: (consecutive duplicates) the seq_num of the received frame is the same as the highest seq_num seen in the istream.

    Although it is not possible to identify all network conditions with the above set of counters, we can still give a guidance of the most likely reason for some counter patterns:

    • If the number of SEQ_TOO_LOW and SEQ_TOO_HIGH are similar, it most likely means reordered packets.

    • if the number of SEQ_TOO_HIGH is much higher than SEQ_TOO_LOW, it most likely means lost packets.

  • The gateway_ippkt_bytes_received_filtered_total and gateway_ippkts_received_filtered_total metrics now correctly report the packets filtered because of wrong combination of source and destination ISD-ASes. These packets are counted with reason = bad_src_dst_ia.

  • Metric dataplane_control_vrrp_state tracks the state of VRRP instances.

  • The dataplane control now exposes NAT metrics. The following metrics are exposed:

    • dataplane_control_nat_max_sessions: The maximum number of NAT sessions.

    • dataplane_control_nat_sessions_total: The current number of NAT sessions.

    • dataplane_control_nat_translations_total: The number of NAT translations. The labels are direction, processing, and protocol.

    • dataplane_control_nat_drops_total: The number of NAT drops. The labels are direction, and processing.

  • The showpaths_mon_paths metric has been replaced with the appliance_cron_paths_to_neighbors metric in the SCION monitoring dashboards. This change ensures that the dashboards reflect the latest path information and improve the accuracy of the displayed data.

  • The SCIONNeighborPathsMissing alert has been updated to use the appliance_cron_paths_to_neighbors metric instead of showpaths_mon_paths. This change reflects the new metric that provides more accurate path information.

  • The BGPUnexportedRoutes alert has been updated to use the new frr_bgp_peer_prefixes_advertised_expected_diff metric to reduce the false positives if /bgp/global/networks are configured. In case of customer FRR templates, the alert can still trigger false positives.

Fixes

Prevent BGP misconfiguration

The appliance now validates that the /bgp/neighbors/*/local_as must not be equal /bgp/global/as. Previously, this misconfiguration was not detected during validation and could result in incorrect BGP setups. With this update, the appliance now prevents users from pushing such invalid configurations.

A migration has been added which removes the local AS number from a neighbor if it is equal.

The FRR template has been adjusted:

--- a/frr.conf.tmpl
+++ b/frr.conf.tmpl
@@ -43,7 +43,7 @@ bfd
 ! BGP configuration
 !
 {{ define "neighbor" }} neighbor {{ .GetNeighborAddress }} remote-as {{ .PeerAs }}
-{{- if .LocalAs }}
+{{- if .SetLocalAs }}
 neighbor {{ .GetNeighborAddress }} local-as {{ .LocalAs }}
 {{- end }}
 {{- if .PlaintextAuthPassword }}

Note

If you have manual template overrides, we automatically replace {{- if .LocalAs }} with {{- if .SetLocalAs }} in your overrides if there is an exact match. If not, you will have to manually adjust your overrides.

Other fixes

  • The POST /cppki/certificates/renew endpoint now correctly uses the configured issuers when the request body does not specify the target issuer explicitly. Previously, the renewal would default to the issuer that has issued the latest certificate chain, rather than the configured issuers. After a failover renewal, this could lead to sending a renewal request to the secondary CA rather than the configured primary CA.

  • The appliance GET /api/v1/health endpoint now correctly reports the sibling interfaces (check_id 3003-2002) as up when configuring interfaces on an already established sibling in the same AS.

    Previously, when an appliance already had a sibling with interfaces in the same AS, and new interfaces were configured on that sibling, the endpoint would incorrectly report the new interfaces as down until there was a BFD state change to that sibling. This was only a reporting issue, the connectivity was not affected. For prior releases, you can restart the router:

    appliance-cli post debug/services/router/restart
    
  • The /management/telemetry/flow_metrics/flow_expiration_interval appliance configuration value is now actually used. Its default value is changed to 2m to match the old hard-coded default value.

  • The appliance now respects the timeout query parameter in the packet tracing debug endpoint, and does not prematurerly cancel the request after 10 seconds.

  • The appliance now does no longer attempt to establish an EDGE-to-EDGE encryption session with remotes that do not support payload encryption.

  • Announcing a /0 prefix in IP-in-SCION tunneling is now correctly processed and no longer ignored.

  • The gateway_ping_reachable and gateway_ping_reachability_changes metrics are now populated correctly.

  • The gateway_info_seccom_addresses_fetched metric is now set to 1 when the remote supports payload encryption and to 0 when it does not.

  • Fix a bug that occasionally caused a panic when reading EDGE-to-EDGE encryption metrics from VPP.

  • The throughput_bytes and throughput_frames in debug/scion-tunneling/paths now report the correct values. Previously, they almost always reported 0.

  • The appliance now calculates the correct SCION path MTU in all cases. Previously, if a segment shortcut terminated in an AS where the ingress SCION interface MTU was smaller than the internal MTU, the appliance would unecessarily restrict the path MTU to the smaller value.

  • Adjust IPv6 related settings on the scion-gateway tun interface to avoid packets such as DAD, RS and/or RA, which resulted in dropped packets on egress as we do not forward IPv6 link-local. This was only a cosmetic issue in the metrics.

  • Fix a memory leak in the IP-in-SCION tunneling component. This was only triggered when /scion_tunneling/static_announcements were configured with next hop tracking enabled.

  • Fix a bug that artificially reported discarded IP packets with reason too_old.

  • Fix a bug that increased the frames discarded with reason too_old even though no frames were actually discarded.

Breaking Changes

Fixed source address for cluster synchronization

The appliance now uses a fixed source address for the cluster synchronization traffic. To fetch topology configuration, it uses the same IP as the IP in /cluster/synchronization/address. To fetch SCION control plane information, such as beacons, segments, and certificates, from a peer appliance, it uses the same IP as the IP in the /scion/ases/*/control/address field.

This is usually the desired behavior, as it allows to operate such traffic over loopback interfaces. However, this change can potentially break your setup, if you have strict firewall rules and the source address changes.