Appliance Release v0.36

This page contains the release notes for the v0.36 Anapaya appliance software release. The appliance software release is applicable for the following Anapaya products:

  • Anapaya CORE

  • Anapaya EDGE

  • Anapaya GATE

We recommend always upgrading to the latest available patch release. Please refer to Upgrade Notes (if any) of each release if there are any special steps to be taken when upgrading. For general information on how to upgrade your appliance, please refer to Appliance Update Guide.

Known Issues

  • Appliance configurations that contain IP prefixes in non-canonical form can lead to IP-in-SCION tunneling service crashes in releases prior to v0.36.3.

    On releases v0.36.2 and older, the following accept filter could lead to a crash of a software component and loss of IP-in-SCION tunneling connectivity:

    "accept_filter": [
        {
          "action": "ACCEPT",
          "prefixes": [
            "198.51.100.210/28"
          ],
          "sequence_id": 0
        }
      ],
    ]
    

    Instead, the accept filter should be expressed in its canonical form:

    "accept_filter": [
        {
          "action": "ACCEPT",
          "prefixes": [
            "198.51.100.208/28"
          ],
          "sequence_id": 0
        }
      ],
    ]
    

Upgrade Notes

Warning

In release v0.36.3, the appliance controller has been extended with additional validation rules. If you are upgrading from a previous verison and your appliance configuration contains IP prefixes in non-canonical form, the appliance controller will start and wait for the configuration to be fixed.

To ensure that your appliance is healthy, check the appliance controller logs after the update:

journalctl -u appliance-controller.service -e

Verify that the logs do not contain any log events with the following message:

"validating latest configuration","cause":{"msg":"invalid configuration"...

Warning

SSH keys for the root and anapaya users are now managed via the appliance configuration. If you have previously configured SSH keys for these users, the appliance will automatically migrate them to the appliance configuration. Please refer to SSH Configuration for more information.

It’s recommended to verify that the SSH keys are correctly migrated after the upgrade by checking the appliance configuration. Note, that if you upload a configuration with no SSH keys configured, the appliance will remove all SSH keys from the root and anapaya users and you might lose access to the appliance.

Warning

Release v0.36.0 and newer automatically configure multiple packet forwarding cores (if available). This has the effect that multiple streams are used to send IP-in-SCION tunneled traffic to a remote EDGE or GATE appliance. EDGE and GATE appliances on releases that are older than v0.34.0 do not support this feature. If you have such remote peers, you should configure only a single fowarding core in the system.vpp.cpu.workers configurations:

{
  "system": {
    "vpp": {
      "cpu": {
        "workers": 1
      }
    }
  }
}

Warning

Release v0.34.0 and newer require Ubuntu 22.04 as the system and at least the anapaya-system package with version 2.8.0.

v0.36.5 (2024-08-16)

Improvements

  • The SCION control service now includes the beacon ID when logging beacon propagation errors. This will allow operators to track down offending beacons easier.

  • The SCION control service now uses a more optimized verification function during signature verification based on AS certificate chains.

  • The SCION control service now validates that it has the required AS certificate chains when propagating beacons. Previously, in rare edge cases (e.g., after manually deleting an AS certificate from the trust database) the service would propagate beacons that could not be verified by the receivers due to missing certificate chains.

v0.36.4 (2024-07-29)

Fixes

  • The automatically generated list of allowed interface in the local topology is now fixed.

  • Fixed a bug where the VPP_VMXNET3 driver would not work in certain circumstances.

  • The gateway_prefix_fetch_invalid_total metric is now correctly reporting values again.

  • Prevent path changes from being counted if they only differ in expiry in the gateway_session_path_changes metric.

  • The gateway_prefixes_advertised metric is correctly reporting values again.

v0.36.3 (2024-07-18)

Fixes

  • The appliance controller now requires all IP prefixes to be provided in canonical form. In prior releases, a non-canonical IP prefix in the accept filter could lead to crashing of the IP-in-SCION tunneling service.

  • The gateway_frames_discarded_total metric is now correctly reported again.

  • Adding or deleting traffic matchers of a domain now properly forwards traffic for the affected prefixes of the domain instead of silently dropping it.

  • Changes to discovered metadata from a remote IP-in-SCION tunneling endpoint are now correctly taken into account. Previously, changes like updated allowed interfaces were ignored.

Improvements

  • The rate at which the local metrics are scraped and injected to the log journal for the debug information archive has been reduce.

v0.36.2 (2024-06-14)

Fixes

  • The journald log entries are now exposed via /api/v1/debug/logs/entries, as expected by the specification. Before, the endpoint was exposed via /api/v1/logs/entries.

  • The appliance-controller now correctly generates the gateway configuration when SCION RSS is enabled in the appliance configuration.

  • The GATE flow exporter can no longer crash when a large number of flows are added and deleted.

  • Monitoring a large number of paths will no longer lead to dropped path monitoring probes in the IP-in-SCION tunneling component.

  • The IP-in-SCION tunneling endpoints now properly discovers remote endpoints in a disconnected AS. Before, failover towards EDGE gateways would not have worked in such a scenario.

Improvements

  • The IP-in-SCION tunneling compmonent now monitors maximum 100 paths per remote AS. This is a measure to prevent high load on topologies with very high path diversity.

v0.36.1 (2024-05-28)

Fixes

  • The VPP dataplane no longer experiences a rare crash that could happen when gateway flow metrics were enabled (GATE only).

  • The syslog can no longer fill up the log partition.

v0.36.0 (2024-05-23)

Features

Source NATing for outgoing traffic

The appliance now supports source NATing for outgoing traffic. While outgoing traffic on any interface can be NATed, this feature is particularly useful for NATing traffic that will be sent over an IP-in-SCION tunnel. Outgoing source NAT is useful for deployments that only have a single (or few) public IP address(es) that can be tunneled through an IP-in-SCION tunnel. The NAT allows multiple internal hosts to share the same public IP address.

An operator can configure the NAT address pool, i.e., the list of IPv4 prefixes that can be used as public IP addresses for NATing. These addresses should also be announced to remote IP-in-SCION tunneling endpoints. Furthermore, the operator defines for which outgoing interfaces the NAT should be applied, in most cases this will be the special scion-gateway interface. Finally, it is also possible to exclude certain addresses from being NATed, e.g., in case a host should be reachable directly via its public IP address.

For more information on how to configure source NATing, please refer to Network Address Translation (NAT) and for a specific example on how to configure outgoing source NATing for IP-in-SCION traffic, please refer to Configuring egress NAT.

Automatic forwarding core assignment

The appliance uses the Vector Packet Processor (VPP) framework as its forwarding dataplane. VPP could already be configured to use multiple CPU cores (workers) for packet processing through the system.vpp.cpu.workers setting. Starting with the current release the appliance now automatically calculates a suitable worker configuration based on the number of available CPU cores. The automatic assignment of worker threads takes into account sibling cores when hyper-threading is enabled and aims to not assign workers to sibling cores of the control plane. These values can be overridden by explicitly configuring system.vpp.cpu.main_core, system.vpp.cpu.workers, and/or system.vpp.cpu.corelist_workers entries, however, for most deployments, the automatic assignment should produce the best results.

Note

Each configured worker is pinned to a separate CPU core. These workers will consume 100% of the core they are pinned to, because the worker is constantly polling for packets.

Management of SSH keys via the appliance configuration

The appliance can now manage SSH keys for the root and anapaya users on the appliance. Other users are not supported on the appliance. An arbitrary number of authorized SSH public keys can be configured for those users. Keys must be base64-encoded as used in the authorized_keys file, e.g., as produced by the ssh-keygen tool.

Refer to SSH Configuration for more information on how to configure SSH keys for the root and anapaya users.

Simultaneous IPv4 and IPv6 BGP peering and BFD support

The appliance now supports the simultaneous configuration of IPv4 and IPv6 BGP neighbors. Separate neighbors must be configured for each address family. It is not supported to exchange IPv4 and IPv6 routes using the same BGP session. Furthermore, the appliance now supports BFD for BGP sessions. This allows for faster detection of BGP session failures. Use the new bgp.bfd configuration options to enable BFD for BGP sessions.

Refer to Configuring BGP for more information on how to configure BGP neighbors and BFD for BGP sessions.

Appliance debug information archive

The appliance now supports generating a debug information archive containing important state, configuration, and log information. This archive can be used by Anapaya support and engineering to diagnose issues with the appliance. To create an archive, run the appliance-cli debug dump --duration <duration> command. The duration specifies the time range for which logs should be included in the archive and is set to 1 hour by default.

Improvements

Improved behavior for path changes in the IP-in-SCION tunneling service

The IP-in-SCION tunneling service has a complex control plane that is responsible to configure the data plane based on path and routing policies, announced prefixes, availability of remote endpoints, and health and performance characteristics of SCION paths. In our previous release, we rewrote the IP-in-SCION tunneling service from scratch to improve its overall scalability. In this release, we have further improved the behavior of the reconfiguration pipeline specifically for path changes. A path change will now only trigger a minimal amount of reconfiguration which reduces the pressure on the dataplane API under high churn conditions.

Improved control over announced and accepted IP prefixes

The announce and accept filters for IP prefixes in routing domain configurations now allow more control over which prefixes are announced and accepted. For example, if a remote gateway announces 1.2.3.0/24 but the local prefix filter accepts 1.2.3.0/25 and 1.2.3.128/25, the two /25s are announced to the local network. Additionally, also the original /24 is announced to the local network since the combined set of accept filters covers the /24.

New functionality for the appliance CLI

  • appliance-cli service metrics level allows to set the metric level of SCION services. The default level is prod. With the level set to debug the service exposes additional metrics.

  • appliance-cli crypto forwarding-key --size X generates a SCION AS forwarding key on the appliance.

  • appliance-cli info auth the configured management API users and whether they are using the insecure default password.

  • appliance-cli info network shows network interface state information.

  • appliance-cli info scion shows SCION state information.

  • appliance-cli info tunneling prints the status of the IP-in-SCION tunneling. For each domain it reports the number of prefixes that are received/advertised and the remote endpoints per ISD-AS.

Impoved appliance health API endpoint

A new health API endpoint has been added to the appliance GET /api/v1/health. It reports the health status of the appliance based on a set of health checks that are executed. appliance-cli get health will show the health status of the appliance. This endpoint should be preferred over the existing GET /api/v1/debug/services/{service_name}/health endpoints.

For more details about the health checks refer to the Appliance API Specification.

Configurable CPU and memory resource limits per service

The CPU and memory resource limits can now be configured per service using the system.resources.service_limits section. This allows fine-grained control over how much CPU and memory each service can use. Furthermore, the defaults for all performance-critical services have been adjusted to better match the actual resource requirements of these services.

Note

This is a feature for expert users. The default values should be sufficient for most deployments.

Various other improvements

  • The size of RX/TX queues for VPP interfaces can now be configured on the appliance. The default value is 1024. This is an expert setting and the default value should be sufficient for most deployments.

  • The VRRP preempt mode can be disabled by setting the no_preempt option to true in the VRRP configuration. This can be used to prevent a backup from taking over the master role even if it has a higher priority.

  • The appliance API now validates that the configured VPP main thread and the number of VPP worker threads are valid with respect to the CPUs available on the appliance.

  • The validation of the appliance cluster configuration is now stricter and detects more misconfigurations. Specifically it detects duplicate cluster synchronization endpoints, duplicate SCION shards, duplicate SCION control addresses, and duplicate SCION neighbor interfaces.

Fixes

  • The appliance-cli info command does no longer report misleading health status information for the appliance.

  • The BGP ASNs 23456, 65535 and 4294967295 are no longer allowed to be configured on the appliance. These ASNs are reserved and should not be configured.

  • The VRRP preempt mode is now enabled by default as the RFC 5798 defines it. This means that the master router will preempt the backup router if it comes back online after a failure. This is a change from the previous behavior where the preempt mode was disabled by default.

  • The appliance now correctly configures the firewall to allow SCION traffic on the internal SCION interface if the interface is configured using the LINUX driver. Before, the firewall would block SCION traffic on such interfaces.

  • The VPP main thread will now no longer use 100% of a CPU core when the number of VPP worker threads is larger than 0.

  • IP prefixes configured in the static announcements are now properly announced again after deconfiguring the next-hop IP address.

Breaking changes

  • For all static-announcements, next-hop tracking must be defined by default. The enabled flag is replaced by a disabled flag, which is set to false by default. If disabled is set to true, the next-hop tracking is disabled. The automatic configuration migration will automatically adapt the configuration. Make sure that you have a next-hop set if you have static announcements.