Performance Evaluation¶
Forwarding performance measurements¶
We continuously test the forwarding performance of the qualified hardware platforms. To determine the forwarding performance we run a set of tests for each hardware. The different test cases are described in separate sections below.
Hardware Setup¶
Depending on the processing that is tested a different physical topology is used:
2-Node Topology: Consists of the Traffic Generator (TG) and the Device under Test (DUT) connected over a switch. Used for evaluating forwarding processing of the DUT, such as IP or SCION packet routing.
3-Node Topology: Consists of the Traffic Generator (TG), the Device under Test (DUT), and a secondary Device under Test (DUT2) connected over a switch. Used for evaluating the tunneling processing of the DUT, i.e., encapsulation and decapsulation, such as the IP-in-SCION tunneling. The DUT2 serves as the opposite side of the tunnel and is sufficiently sized to not be a bottleneck of the test.
For devices that have multiple network interfaces, each type of interface is considered separately. For example, a device with 2x 10Gbps and 2x 1Gbps interfaces will have separate tests for the 10Gbps and 1Gbps interfaces.
Methodology¶
To determine the forwarding performance of a device under test, we run the multiple loss ratio search (MLRsearch) to find the non-drop rate (NDR) of the device. The MLRsearch is described in an IETF RFC draft. In short it’s an optimized method to determine the packet throughput rate of a device under test. The NDR is the maximum packet throughput rate at which no packets are lost. To determine the NDR we use the T-Rex traffic generator.
NDR is measured for different packet sizes:
minimum packet size, i.e., no payload
maximum packet size, i.e., 1500 bytes
IMIX_v4_1 (28x min,16x ~570B,4x max)
Note that the actual packet size can vary slightly depending on the traffic generated (IPv4, IPv6, SCION).
The generated traffic uses multiple network flows, each with a unique destination address, such that packet processing is distributed across the device’s processing cores (if multiple cores are available).
Test Coverage¶
IP routing¶
To establish a baseline for a device, we evaluate the NDR of the IPv4 forwarding performance in the 2-node setup. This ensures that the network interface card (NIC) works as expected with the VPP software stack that we use. In this test case we expect to reach line rate of the NIC (for IMIX packet sizes). The DUT is configured without any SCION components, only IP routing is enabled.
This test case is performed for all devices.
SCION forwarding¶
This test case is used to determine the NDR for SCION forwarding performance of the DUT in the 2-node setup. The DUT is configured as a SCION router with one internal and one SCION interface. Both interfaces are connected to the TG, which generates SCION packets to both interfaces.
This test case is mainly for CORE devices.
IP-in-SCION tunneling¶
This test case is used to determine the NDR of the DUT when IP packets are encapsulated and decapsulated into SCION packets using the IP-in-SCION tunneling component. This test is performed in the 3-node setup. The DUT is configured as in the SCION routing test, but with the addition of the IP-in-SCION tunneling component. The DUT2 is used as the opposite side of the SCION link and the SCION tunnel. The TG generates IP packets, just like an IP host would, and sends them to the DUT (or DUT2, depending on the direction).
This test case is mainly for EDGE and GATE devices.
IP-in-SCION tunneling with encryption¶
This test case is the same as the IP in SCION tunneling test, but with the addition of encryption.
This test case is mainly for EDGE devices.
Flow metric reporting performance measurements¶
A GATE can export flow metrics of the IP traffic that it sends over the SCION Internet. The flow metrics include the number of packets and bytes transferred for each IP flow, as well as the start and end time of the flow. The flow metrics are exported to a flow collector in the cloud for further processing. Because the flow metrics are used for billing purposes, it is crucial that all flows are correctly reported.
Therefore, we continuously test the flow reporting performance of the GATE. The test assures that the device correctly reports all flows to the flow collector, under particular load conditions, i.e., traffic patterns.
Hardware Setup¶
The hardware setup is the same as for the IP-in-SCION tunneling performance test. Additionally, the DUT is configured with the flow exporter component enabled. The flow collector is an additional host connected to the DUT’s management port.
Methodology¶
To determine the flow reporting performance of a device under test, we run the flow exporter test. The test performs multiple runs each with a particular traffic pattern and checks that validate that (1) no packets were dropped and (2) all flows and packets were reported correctly. Traffic is sent for multiple minutes to reach a steady state of flow processing and reporting.
The traffic patterns for the flow exporter test are defined by following parameters:
Traffic rate: The rate at which packets are sent to the DUT (packets/second or bits/second).
Traffic type: The type of traffic, mainly the packet size.
Number of persistent flows: The number of network flows that are continuously sent to the DUT.
Number of transient flows: The number of network flows that are sent to the DUT for a limited time defined in new flows per second (fps).
For simplicity, we always use the same traffic type, the IMIX_v4_1 packet size distribution, and the same traffic rate of 100000 packets/second.
Number of persistent flows is fixed to 256, leaving the number of transient flows as the variable parameter. We argue that transient flows require more processing resources than persistent flows, because (1) the dataplane has to create/delete more flow entries and (2) the flow exporter has to handle more flow records. Therefore, the number of transient flows is a good indicator of the flow reporting performance.
Test Coverage¶
The flow exporter test is performed for all GATE devices.
Test Result Interpretation¶
Assume the DUT is able to handle 10’000 fps under the above defined conditions.
This means that 600000 unique flows were successfully processed and reported per minute, assuming equal distribution of start times of flows. In a real-world scenario, the number of flows can be interpreted as the number of users accessing a service. I.e., the DUT can handle 600
000 concurrent users in a
minute.
In case flow starting times are not equally distributed, it is possible that flow records are lost. The loss of flow records can mean that some flows will not be accounted for. Loss of flow records is detected by DUT and reported.
Automated Performance Evaluation and Regression Detection¶
Performance evaluation is done automatically nightly or weekly. The results are stored and compared to previous results. If there was a regression in the performance, the team is notified and countermeasures are taken. This way performance regressions are prevented in new releases.