Appliance Release v0.37

This page contains the release notes for the v0.37 Anapaya appliance software release. The appliance software release is applicable for the following Anapaya products:

  • Anapaya CORE

  • Anapaya EDGE

  • Anapaya GATE

We recommend always upgrading to the latest available patch release. Please refer to Upgrade Notes (if any) of each release if there are any special steps to be taken when upgrading. For general information on how to upgrade your appliance, please refer to Appliance Update Guide.

Upgrade Notes

Warning

In release v0.37.0, new configuration validation checks regarding VPP buffer allocations were added. In case your configuration is not compliant with the new validation checks, the appliance controller will start but wait for the configuration to be fixed. The logic has been improved in v0.37.2 to also take the available VPP buffers into account. We recommend to upgrade directly to v0.37.2 or later.

The affected releases are v0.37.0 and v0.37.1. If you are upgrading to one of these releases, please check the appliance controller logs after the upgrade:

journalctl -u appliance-controller.service -e

Verify that the logs do not contain any log events with the following message:

"validating latest configuration","cause":{"msg":"invalid configuration"...

Such a log message also contains a detailed description of the validation error, pointing out what needs to be fixed in the configuration.

Warning

In release v0.37.0 and newer, the IP-in-SCION tunneling always needs access to AS certificates of configured local ISD-ASes. For EDGEs this is naturally given, but for a GATE this might previously not have been set up. If you are upgrading a GATE to v0.37.0 or newer, please make sure to provision the GATE instance with the necessary control plane crypto material (AS certificate and TRC).

See Certificate/TRC Provisioning for more information on how to configure the TRC and AS certificate. To get an AS certificate it’s easiest to request it via a sibling appliance.

A new alert GatewayASCertificateExpiresSoon has been added to indicate if the AS certificate expires soon.

Warning

Release v0.34.0 and newer require Ubuntu 22.04 as the system and at least the anapaya-system package with version 2.8.0.

v0.37.5 (2024-11-25)

Warning

When upgrading GATE appliances from an earlier v0.37.x (< v0.37.3) release you need to manually configure the added management.telemetry.flow_metrics.max_active_flows and the system.vpp.statseg.size fields in the appliance configuration. Migration only works when upgrading from an older release.

Fixes

  • The IP-in-SCION tunneling process now recovers quickly (within 10 seconds) after all paths to remote ASes have been dead. Previously this could take up to 1 hour until the connectivity was established again.

  • Resolve a bug that could potentially lead to stale path data being used in the IP-in-SCION tunneling process (this has never been observed in production).

v0.37.4 (2024-10-29)

Warning

When upgrading GATE appliances from an earlier v0.37.x (< v0.37.3) release you need to manually configure the added management.telemetry.flow_metrics.max_active_flows and the system.vpp.statseg.size fields in the appliance configuration. Migration only works when upgrading from an older release.

Fixes

  • The appliance now correctly selects the local SCION AS when running the POST /api/v1/cppki/certificates/request HTTP request, even if multiple SCION ASes are configured on the appliance. Previously, the appliance would always use the default SCION AS. If the certificate request targeted an issuer AS that was not reachable from the default SCION AS, e.g., disconnected ISDs, the request would fail due to no paths being available.

  • The appliance now configures a certificate cleanup job on GATE hosts. Previously, certificates were periodically renewed, but not cleaned up anymore. As a result, a substantial amount of certificates were stored on disk. Paired with the certificate loading logic which enforces a strict timeout, this could lead to services not being able to load certificates anymore. Eventually, the cached certificate would run out, and the services would not be able to create signatures for the control plane anymore. Before this happens, you would receive the GatewayASCertificateExpiresSoon alert and could mitigate the problem by manually deleting the expired certificates.

  • The new path monitoring of the IP-in-SCION tunneling had a rare race condition that would lead to stale path monitoring data. In the worst case an affected component can result into blackholing traffic because of using dead paths. If being affected the monitored path count will decrease over time, the paths no longer reported as monitored will have stale data.

  • The IP-in-SCION tunneling now uses an interval of 0.5s for endpoint to endpoint pings instead of 1.5s. This change ensures that the failover in case of a failing remote endpoint happens faster.

  • The IP-in-SCION tunneling no longer suffers from a minimal (<0.1s) traffic interruption in certain cases when switching paths and sessions.

Improvements

  • Minimal reductions in the resource usage of the IP-in-SCION tunneling component.

  • The default cost of the bcrypt algorithm in appliance-cli crypto kdf has been increased from 10 to 12. We recommend that you re-create the hashes according to your threat model.

v0.37.3 (2024-10-10)

Warning

When upgrading GATE appliances from an earlier v0.37.x release you need to manually configure the added management.telemetry.flow_metrics.max_active_flows and the system.vpp.statseg.size fields in the appliance configuration. Migration only works when upgrading from an older release.

Fixes

  • The appliance now always returns a validation error if the VPP tun section is missing in the configuration and is required, regardless of whether the VPP section is set.

  • The appliance now only considers the local ISD-ASes configured in the IP-in-SCION tunneling domains when generating the configuration for the tunneling component. This ensures that simply adding a new local ISD-AS to the SCION section will not have an impact on the IP-in-SCION tunneling component if the new ISD-AS is not part of a tunneling domain.

    Note that if there is a domain with no local ISD-ASes specified, the appliance will always consider all local ISD-ASes in the SCION section.

  • The default buffer and worker allocation validation now triggers when a new configuration is pushed. Previously, with the validation introduced in v0.37.2 it only ran when configuring the dataplane, which could lead to a non-running dataplane.

    Also the available memory on the system is now correctly taken into account when doing the validation.

  • The IP-in-SCION tunneling component no longer requires a TRC to be present for a local ISD-AS on start. However a missing TRC means that no paths via the ISD of the missing TRC are available.

  • The IP-in-SCION tunneling component now handles remote probe port changes correctly.

  • Fix a bug in the dataplane-control service that caused an error when adding a new route and bringing the interface up at the same time. When adding a new route to an interface, the interface’s admin state must be up. This change ensures that the interface is brought up before adding the route.

  • Fix a goroutine leak in GATE appliances.

  • The appliance now validates that the dispatcher port 30041 cannot be allocated to any other service.

Improvements

  • The configuration now supports configuring the maxmimum number of active flows the GATE keeps track of. A migration will set this field to 500k active flows if flow exporting was configured in the old configuration.

    The GATE holds the flows in the statseg memory of VPP, therefore the statseg memory also needs to be configured in accordance with the maximum number of active flows. The appliance will validate that the statseg memory has a big enough size.

    Currently the validation asserts that the statseg memory is at least 200 x max-active-flows + 32M.

    Note that changing the stateg field will trigger a restart of the dataplane and should be done with care.

    Example configuration snippet:

      "management": {
        "telemetry": {
          "flow_metrics": {
            "collector_url": "http://mgmt.internal.minimal:8080",
            "enabled": true,
            "max_active_flows": 500000
          }
        }
      }
      "system": {
        "vpp": {
          "statseg": {
            "size": "132M"
          }
        }
      }
    

    If flow exporting is not configured, neither of the fields need to be configured, the defaults are sufficient.

  • The IP-in-SCION tunneling component reports two new metrics:

    • gateway_flow_exporter_flows_total: the number of flows currently being tracked.

    • gateway_flow_exporter_flows_limit: the maximum number of flows that can be tracked.

    This is relevant for GATEs where we track flows for the purpose of billing.

    Additionally there is a new alert GatewayFlowsCloseToLimit which is triggered if the number of tracked flows is over 80% of the limit.

v0.37.2 (2024-09-19)

Warning

Release v0.37.2 contains known bugs that affect the EDGE and GATE products. We recommend to only upgrade your EDGE and GATE appliances to this release if you are running v0.37.0 or v0.37.1. If you are running an older release, we recommend to wait for the next patch release.

Fixes

  • The default number of VPP workers now also takes the available VPP buffers into account, such that validation errors are avoided on certain platforms.

    Previously, the calculation of the default number of VPP workers did not take the available VPP buffers into account. With the added validation in v0.37.0, that there is a sufficient number of VPP buffers, this could lead to validation errors if the values were not explicitly set in the appliance config.

  • The dataplane control now successfully updates IP neighbors and routes. Previously, applying updates would fail and in most cases require a dataplane service restart.

  • The IP-in-SCION tunneling process now correctly handles invalid certificate chains (e.g., due to expiry) and continues looking for a valid certificate.

    Previously, an expired certificate could cause the IP-in-SCION tunneling service to not find a valid certificate and result in server unavailability. This could lead to a loss of IP-in-SCION connectivity for 15 minutes until the expired certificate has been cleaned up.

  • Fix a race condition that occasionally could lead to a crash of the IP-in-SCION tunneling process due to incorrectly computed path fingerprints. This only affected setups with multiple local ASes configured.

Improvements

  • The scion-pki trc payload is now more ergonomic to use:

    • The validity.not_before field can now either be a UNIX or an RFC3339 compatible timestamp.

    • The validity.not_after field is an alternative to the validity.validity field which allows setting an UNIX or an RFC3339 compatible timestamp instead of the duration.

    • The cert_files list allows referencing certificates from the predecessor TRC with predecessor:<index>. This way, unchanged certificates do not need to be re-distributed during a TRC ceremony.

  • The new scion-pki trc payload dummy command creates a dummy payload in either PEM or DER format that can be used to test access to the signing keys in preparation to a TRC ceremony.

v0.37.1 (2024-09-11)

Fixes

  • Make sure the frr-exporter exports metrics again if BFD is not configured.

  • The ca-frontend no longer uses 100% CPU.

  • Fix an issue where the IP-in-SCION tunneling component would not read all the locally available IP prefixes.

Improvements

  • The ISD-AS number in the Distinguished Name of the CPPKI certificates is now shown in the output of scion-pki certificate inspect.

v0.37.0 (2024-09-06)

Improvements

Improved IP-in-SCION tunneling path selection algorithm

The internal path and remote endpoint selection algorithm for IP-in-SCION tunneling has been reworked significantly. The new algorithm reduces the traffic caused by the monitoring system and with this also reduces the CPU usage.

The new monitoring will also detect internal network disconnects in the destination ISD-AS and route around them if possible.

With the new monitoring there are also new metrics available and shown in the IP-in-SCION related dashboards. In the dashboards there are now specific panels that are only relevant for releases with version v0.37.0 and newer.

The new metrics are the following:

  • gateway_domain_traffic_redirections_total: indicates the number of traffic redirections per domain and traffic matcher. This metric is shown in the “Traffic redirections” graph on the IP-in-SCION tunneling dashboard. A traffic redirection means that traffic will be routed via a different path or to a different remote endpoint.

  • gateway_domain_paths_total: indicates the number of paths per domain and traffic matcher. This metric is shown in the different “Paths” graphs on the IP-in-SCION tunneling dashboard under the “Domain monitoring >= v0.37” section.

  • gateway_domain_traffic_matcher_sessions_total: indicates the number of sessions per domain and traffic matcher. This is used for alerting. If no session is available for a traffic matcher an alert is triggered as this means potential traffic loss. There is a new alert TunnelingDomainNoAlivePaths that uses this metric.

IP-in-SCION Tunneling Input Filter

The IP-in-SCION tunneling filters tunneled incoming packets.

First, it performs reverse path filtering, which is a security feature that helps to prevent IP spoofing attacks by verifying that the source address of an incoming packet is reachable via the interface that the packet was received on. For IP-in-SCION tunnels, reverse path filtering verifies that the source IP address of a tunneled incoming packet is an address that is being announced by a remote AS in the SCION network.

Second, it verifies that the remote gateway that sent the packet is allowed to send tunnel IP packets with the source IP address that is being used. I.e., the remote gateway’s ISD-AS is configured as an accepted remote ISD-AS in the domain associated with the source IP address.

Third, it verifies that the IP packets was sent over an encrypted tunnel if required.

If any of these checks fail, the packet is dropped.

Configuration

The feature is enabled by default and can be disabled by setting the disable_urpf field in the scion-tunneling.endpoint section of the appliance configuration to true:

{
  "scion-tunneling": {
    "endpoint": {
      "disable_urpf": true
    }
  }
}

Various other improvements

  • The appliance API OAuth integration now supports role aliases. Role aliases can be used to map the role name in the OAuth token to a role name in the appliance API. This is useful for mapping different role names from different identity providers to the same role in the appliance. If no aliases are configured for a role the default aliases are appliance.<role>, appliance/<role>, and appliance:<role>. The appliance API currently only supports the reader and writer roles.

  • Add the SCION link type label to the router_interface_up metric.

  • Updating the MAC of an interface with VLAN sub-interfaces does not cause a recreation of the sub-interfaces anymore.

  • The appliance-controller and debugscraper systemd services now have a CPU quota set to 100% to ensure that they do not consume more than one CPU.

  • The appliance now validates that the number of buffers allocated by VPP is sufficiently high in relation to the number of VPP worker threads and the number of VPP interfaces. This validation is performed whether or not the number of buffers is explicitly configured.

  • Add SSH information to the appliance-cli info auth command.

  • Add validations to the appliance which ensure that VFs cannot be created on interfaces with ena, virtio-pci or vmxnet3 drivers.

  • ISD-AS numbers in non-canonical form (e.g. with capital letters) are now rejected by the appliance. Given configurations with such numbers did also previously not work correctly, no migration is done.

  • The appliance takes the installed system version into account when calculating the etag during configuration fetches and updates.

  • The appliance API now does not report an error if the same TRC is pushed multiple times as long as the contents are identical. This is particularly useful if a TRC bundle is pushed multiple times.

  • The appliance API returns the installed system package version as part of the metadata when querying the GET /config endpoint.

  • The appliance now forces the user to explicitly configure the management API without authentication with the /management/api/unprotected field. Configurations that do not have this field set to true and have no API authentication enabled are considered invalid. For appliances that migrate from previous releases, the unprotected flag is automatically set in migration if no authentication is enabled.

  • The number of buffers allocated by VPP can now be explicitly configured in the appliance configuration.

    {
        "system": {
            "vpp": {
                "num_buffer": 32400
            }
        }
    }
    

    The appliance validates that the memory allocated for VPP buffers fits into the configured hugepages memory.

Change categories

In the following we list the different change categories that are used in the release notes.

  • Features: Describes new features that have been added. Example: The appliance API can now be protected with OIDC/OAuth2.

  • Improvements: Describes improvements to existing features. Example: The routing table implementation is now 30% faster.

  • Fixes: Describes bug fixes, i.e. previously broken behavior that is now fixed. Example: The appliance no longer crashes when adding a new route.