Basic Troubleshooting Guide

A network might start misfunctioning for a wide range of reasons reaching from hardware problems to software issues. Here, we give some basic guides on how you can troubleshoot some common network issues. We mainly focus on the connectivity failures caused by the misconfiguration of SCION services.

SCION Connectivity Issues

Issue: You are operating the SCION AS 1-ff00:1:1 and you have been notified that the connectivity (either SCION connectivity or the IP connectivity through an IP-in-SCION tunnel) from the host EDGE-1 in your AS to the neighboring AS 1-ff00:1:2 is lost.

Note

In practice, you ideally should get informed about such an incident through your alerting system which sits on top of the monitoring system. Therefore, you might be able to extract infromation form the alerts which could lead you to the source of the issue. For this guide, we do not rely on such information as it is dependent on your monitoring and alerting systems.

Note

The steps taken here for troubleshooting should be perceived solely as recommendations. Furthermore, they are meant to assist you with resolving only a small subset of issues you might encounter in practice.

A reasonable first step is to log into EDGE-1 and check the set of SCION paths to the AS 1-ff00:1:2.

In Not All Expected Paths Are Alive, we consider the case where you do not see the full set of paths you expect and then explain two potential causes and how to resolve them.

In All Expected Paths Are Alive, we cover the scenario where all the expected paths are actually alive. We again discuss two possible causes and guide you how to resolve them.

Not All Expected Paths Are Alive

A basic sanity check for SCION connectivity-related issues is to log into EDGE-1 in the AS 1-ff00:1:1 and run the showpaths command. This command shows us the set of available paths to a particular destination. With the --refresh flag, we can force the scion tool to grab fresh paths from the local SCION control service. The command to run showpaths towards the AS 1-ff00:1:2 is the following:

scion showpaths 1-ff00:1:2 --refresh

If there is no path, the output will look like this:

Available paths to 1-ff00:1:2
Error: no path found

It is also possible that you do not see the complete set of paths you expect or some of them are in the timeout state instead of alive.

For example, you expect to see the path [1-ff00:1:1 2>3 1-ff00:1:2] which corresponds to the link from interface 2 in 1-ff00:1:1 to interface 3 in 1-ff00:1:2, but it is not present. A natural next step for troubleshooting here is to run an IP ping between EDGE-1 and the corresponding router in the AS 1-ff00:1:2. If this works, it means that there is connectivity on the IP underlay connecting EDGE-1 and the router in 1-ff00:1:2. In that case, we can guess that the connectivity issue is on the SCION level. Note that if the ping command is not working either, then it might be a misconfiguration of the underlay network or an issue with networking hardware. Covering such scenarios is out of scope for this document.

Considering the investigations from above, it seems safe to conclude that we have a SCION connectivity issue. We explain two possible reasons for this.

Scenario 1: Endpoint Misconfiguration

One potential cause is that there is an error in the configuration of EDGE-1. This is especially likely if you have just configured EDGE-1. Furthermore, if a non-empty subset of the paths is available, the AS certificate issue that we discuss in the next section can be ruled out on our side.

The issue could be simply caused by a typo in an IP address or a missing entry. In the example above, you need to check the configuration of interface 2 in your AS. If this is the problem, then naturally you should fix the misconfiguration, configure the appliance with the new configuration, and then check that you see the set of paths you expect.

Scenario 2: AS Certificate Issue

If there is no valid AS certificate configured on EDGE-1, the appliance cannot create valid path segments from the beacons because it cannot create signatures. As a result, the showpaths will not display any path. However, the ping commands to the AS 1-ff00:1:2 would work. Thus, the AS certificate might be the source of the problem.

You can see the list of AS certificates that are configured on the appliance by running the command:

curl -k https://${mgmt_address}/api/v1/cppki/certificates

Alternatively, to inspect the list in a browser, you can navigate to https://${mgmt_address}/api/v1/cppki/certificates.

Note

${mgmt_address} needs to be replaced with the actual management address of the appliance.

If there is no AS certificate configured on EDGE-1, the output will look like:

{
   "certificate_chains": []
}

The cause for a missing AS certificate could be that we have just added EDGE-1 and forgot to configure an AS certificate, its certificate has been deleted accidentally, or there was an issue with automatic renewal. Automatic renewal can fail, for example, when there has been a prolonged connectivity issue in the order of days.

Thus, to resolve the issue, you need to add a valid AS certificate to EDGE-1. In general, an AS certificate needs to be requested from one of the CAs of the local ISD. The initial certificate is requested with an out-of-band mechanism. Afterward, the AS certificates are periodically renewed by the appliance in a fully automatic fashion. See Certificate/TRC Provisioning for more details on listing, generating, and installing AS certificates.

All Expected Paths Are Alive

Scenario 1: Domain Misconfiguration

Assume that according to your SCION Gateway Routing Protocol (SGRP) setup, you are expecting that a ping command from the end host Endhost-1 (in the AS 1-ff00:1:1) to the end host Endhost-2 (in the AS 1-ff00:1:2) to work fine, but it does not. On the other hand, running a showpaths command towards the AS 1-ff00:1:2 actually displays all the paths you expect to see between these two ASes.

Note

See Routing Domains for details on the routing domain configuration.

To investigate this further, a reasonable next step is to check out the network prefixes advertised by the local SCION AS (i.e., 1-ff00:1:1) and the prefixes learned from the remote SCION ASes (in particular, 1-ff00:1:2).

These prefixes are exposed by the appliance as an HTTP status page. In fact, gateway is the process that configures and processes everything related to IP-in-SCION tunneling. Thus, it publishes the status page. To inspect that, you can run:

curl ${mgmt_address}/diagnostics/sgrp

Below is an example of how the output could look like:

{
   "advertise": {
         "static": [
            "10.8.0.2/32",
            "10.8.0.5/32"
         ],
         "dynamic": []
   },
   "learned": {
         "dynamic": []
   },
   "next-hops": {}
}

In this case, no prefix from remote ASes has been learned.

If in your setup, there is no prefix learned or more generally the set of learned and advertised prefixes does not match what you expect, then perhaps the domain is misconfigured. If this is the case, you naturally need to first fix the configuration, configure the appliance with the modified configuration, and then check the HTTP status page to confirm that the changes appear there too. Afterward, the ping command from Endhost-1 to Endhost-2 should start working.

Scenario 2: TRC Issue

In order for the appliance to join the SCION network and communicate with other nodes, it has to be configured with a set of TRCs. These TRCs build the trust anchors for verifying all of the control plane data that is exchanged in the SCION protocol. Therefore, the lack of a trusted TRC in the appliance will result in the loss of connectivity.

You can see the list of TRCs which are configured on the appliance by running the command:

curl -k https://${mgmt_address}/api/v1/cppki/trcs

Alternatively, to inspect the list in a browser, you can navigate to https://${mgmt_address}/api/v1/cppki/trcs.

If there is no TRC configured on the appliance the output will be as follow:

{
   "trcs": []
}

This indicates that no TRC is configured on the appliance; thus in order to fix the issue, you need to install a valid TRC on this appliance. You can see Certificate/TRC Provisioning for more details on generating and installing a TRC.

The cause for a missing TRC could be that we have just added EDGE-1 and forgotten to configure a TRC or its TRC has been deleted accidentally. Note that in the second case, the showpaths command might keep functioning correctly for some time.