Performance Optimizations

The following sections describe how to optimize the performance of the appliance. For now, we only provide recommendations for the dataplane because the dataplane is the most performance-critical part of the appliance as it is responsible for forwarding packets. The dataplane is built on top of VPP and mainly uses DPDK for packet I/O. Therefore, general performance optimizations for VPP and DPDK can often also be applied to the appliance as well.

System

CPU

Note

If not explicitly configured, the appliance automatically configures itself according to the recommendations given in this section. Hence, it is usually not required to manually configure this section.

This applies to releases v0.36 and later.

The appliance is designed to run on a multi-core CPU with different cores assigned to different tasks. We recommend that core 0 be assigned to Linux and control plane services. Core 1 should be used as the VPP’s main core, which handles management functions (CLI, API, stats collection). The rest of the cores should be used as the VPP’s worker cores, which perform the packet processing.

Note

To configure multiple workers, the network interface must support multiple queues, as each worker gets a unique pair of RX/TX queues. I.e., it does not make sense to configure more workers than the number of queues supported by the network interface.

These configuration options can be set under the config.system.vpp.cpu section of the configuration file. Assuming that the appliance is running on a 4-core CPU, the following configuration is recommended:

{
    "system": {
        "vpp": {
            "cpu": {
                "main_core": 1,
                "workers": 2
            }
        }
    }
}

If the system only has two cores, we recommend that core 0 is used for Linux and core 1 for VPP. The work performed usually performed by the worker cores is then performed by VPP’s main core. To explicitly achieve this configuration the number of workers can be set to 0 as follows:

{
    "system": {
        "vpp": {
            "cpu": {
                "main_core": 1,
                "workers": 0
            }
        }
    }
}

It is also possible to run the appliance on a single-core CPU and have VPP share the core with Linux.

Note

In case the packet processing is not performed on a separate core, the performance of the appliance will be significantly reduced. In particular, jitter and latency will be negatively affected.

Hugepages

The dataplane makes use of hugepages (see Linux Kernel documentation) for the packet buffers. Hence, it is important that there are enough hugepages available. The size of the hugepages and number of hugepages allocated by the appliance can be configured in the config.system.kernel section of the appliance configuration:

{
    "system": {
        "kernel": {
            "hugepage_size": "2M",
            "hugepages": 256
        }
    }
}

By default, 256 2MB hugepages are allocated. This is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of hugepages.

Note

If there are not enough hugepages available, the dataplane will not start or log an error message.

Note

Currently, the appliance only supports 2MB hugepages.

Buffers

In case there is a larger number of interfaces and worker threads, it might be necessary to increase the number of buffers. By default, the appliance uses a fixed portion of the hugepages to allocate buffers. Hence, it is recommended to increase the number of buffers by increasing the number of hugepages. Alternatively, the number of buffers can be configured in the config.system.vpp.buffers section.

{
    "system": {
        "vpp": {
            "buffers": {
                "data_size": 9000,
                "num_buffers": 32400
            }
        }
    }
}

Note

The memory allocated for buffers must fit into the allocated hugepages.

RX/TX Queues and Descriptors

By default, the appliance configures each interface with one 1 RX queue per worker core. The default RSS hash function over the 5 tuple of the packet is used to distribute the incoming traffic among the workers.

Note

The number of RX/TX queues cannot be independently configured through the appliance API. This will be added in a future release.

The number of descriptors per RX/TX queue is by default set to 1024, which is sufficient for most deployments. If there is a large number of fast interfaces (e.g., 25 Gbps), it might be necessary to increase the number of descriptors. We recommend 2048 descriptors for a 25 Gbps interface and 4096 descriptors for a 100 Gbps interface.

The number of descriptors can be configured individually for each interface in the config.interfaces.<type> section of the appliance configuration:

{
    "interfaces": {
        "ethernets": [
            {
                "name": "eth0",
                "num_rx_desc": 2048,
                "num_tx_desc": 2048
            }
        ]
    }
}

Note

The number of RX/TX descriptors can only be configured for VPP interfaces of the type ethernets and virtual_functions.

SCION

SCION RSS

With SCION RSS, our appliances are capable of utilizing the full potential of the available computational resources. The SCION RSS feature significantly enhances the throughput on multi-core systems by enabling receive side scaling (RSS) for SCION traffic. SCION RSS operates by leveraging source port entropy on the UDP underlay. For the feature to work properly, both sides of a link must support it, i.e., must be running at least release v0.34. In the following, we describe how to configure SCION RSS on links between neighbor appliances and sibling appliances.

Note

A neighbor appliance is an appliance that is directly connected to the appliance but located in a different AS.

A sibling appliance is an appliance that is located in the same AS as the appliance.

Traffic between Neighbor Appliances

To enable SCION RSS for traffic forwarded to a particular neighbor, enable it on all interfaces connected to that neighbor. This can be done by setting the enable_scion_rss option to true in the config.scion.ases.neighbors.interfaces sections:

{
    "scion": {
        "ases": [
            {
                "neighbors": [
                    {
                        "interfaces": [
                            {
                                "interface_id": 1,
                                "address": "[fd02:e8a2:c9e2:03e6::2]:30100",
                                "remote": {
                                    "address": "[fd02:e8a2:c9e2:03e6::1]:30100",
                                    "interface_id": 201
                                }
                                "enable_scion_rss": true
                            }
                        ]
                    }
                ]
            }
        ]

Warning

Only enable SCION RSS for neighbor appliances that support it, i.e., that are running at least release v0.34. This usually requires some information exchange with the operator of the neighbor appliance.

Traffic between Sibling Appliances

With topology synchronization, the appliance automatically detects sibling appliances that support SCION RSS and enables the feature accordingly.

Without topology synchronization, it can statically be defined whether a sibling appliance supports SCION RSS or not, and the feature will be enabled accordingly. This can be done by setting scion_rss to true or false in the config.cluster.peers.features section:

{
    "cluster": {
        "peers": [
            {
                "features": {
                    "scion_rss": true
                }
            }
        ]
    }
}

Warning

Only enable SCION RSS for sibling appliances that support it, i.e., that are running at least release v0.34.

By default, SCION RSS is not enabled for traffic sent from a gateway to a sibling router on a different appliance. To enable SCION RSS set scion_tunneling.endpoint.enable_scion_rss = true.

Warning

Only enable SCION RSS for sibling appliances that support it, i.e., that are running at least release v0.34.