Basic Troubleshooting Guide¶
A network might start misfunctioning for a wide range of reasons reaching from hardware problems to software issues. Here, we give some basic guides on how you can troubleshoot some common network issues. We mainly focus on the connectivity failures caused by the misconfiguration of SCION services.
SCION Connectivity Issues¶
Issue: You are operating the SCION AS 1-ff00:1:1
and you
have been notified that the connectivity (either SCION connectivity
or the IP connectivity through an IP-in-SCION tunnel) from the host
EDGE-1
in your AS to the neighboring AS 1-ff00:1:2
is lost.
Note
In practice, you ideally should get informed about such an incident through your alerting system which sits on top of the monitoring system. Therefore, you might be able to extract infromation form the alerts which could lead you to the source of the issue. For this guide, we do not rely on such information as it is dependent on your monitoring and alerting systems.
Note
The steps taken here for troubleshooting should be perceived solely as recommendations. Furthermore, they are meant to assist you with resolving only a small subset of issues you might encounter in practice.
A reasonable first step is to log into EDGE-1
and check the set of
SCION paths to the AS 1-ff00:1:2
.
In Not All Expected Paths Are Alive, we consider the case where you do not see the full set of paths you expect and then explain two potential causes and how to resolve them.
In All Expected Paths Are Alive, we cover the scenario where all the expected paths are actually alive. We again discuss two possible causes and guide you how to resolve them.
Not All Expected Paths Are Alive¶
A basic sanity check for SCION connectivity-related issues is to log into
EDGE-1
in the AS 1-ff00:1:1
and run the showpaths command.
This command shows us the set of available paths to a particular destination.
With the --refresh
flag, we can force the scion tool to grab fresh paths
from the local SCION control service. The command to run showpaths
towards the AS 1-ff00:1:2
is the following:
scion showpaths 1-ff00:1:2 --refresh
If there is no path, the output will look like this:
Available paths to 1-ff00:1:2
Error: no path found
It is also possible that you do not see the complete set of paths
you expect or some of them are in the timeout
state instead of
alive
.
For example, you expect to see the path [1-ff00:1:1 2>3 1-ff00:1:2]
which corresponds to the link from interface 2 in 1-ff00:1:1
to interface 3 in 1-ff00:1:2
, but it is not present.
A natural next step for troubleshooting here is to run an IP ping
between EDGE-1
and the corresponding router in the AS 1-ff00:1:2
.
If this works, it means that there is connectivity on the IP underlay
connecting EDGE-1
and the router in 1-ff00:1:2
.
In that case, we can guess that the connectivity issue is on the SCION level.
Note that if the ping
command is not working either, then it might be a
misconfiguration of the underlay network or an issue with networking hardware.
Covering such scenarios is out of scope for this document.
Considering the investigations from above, it seems safe to conclude that we have a SCION connectivity issue. We explain two possible reasons for this.
Scenario 1: Endpoint Misconfiguration¶
One potential cause is that there is an error in the configuration of
EDGE-1
. This is especially likely if you have just configured EDGE-1
.
Furthermore, if a non-empty subset of the paths is available, the AS
certificate issue that we discuss in the next section can be ruled out
on our side.
The issue could be simply caused by a typo in an IP address or a missing entry. In the example above, you need to check the configuration of interface 2 in your AS. If this is the problem, then naturally you should fix the misconfiguration, configure the appliance with the new configuration, and then check that you see the set of paths you expect.
Scenario 2: AS Certificate Issue¶
If there is no valid AS certificate configured on EDGE-1
,
the appliance cannot create valid path segments from the beacons
because it cannot create signatures.
As a result, the showpaths will not display any path.
However, the ping
commands to the AS 1-ff00:1:2
would work.
Thus, the AS certificate might be the source of the problem.
You can see the list of AS certificates that are configured on the appliance by running the command:
curl -k https://${mgmt_address}/api/v1/cppki/certificates
Alternatively, to inspect the list in a browser, you can navigate to https://${mgmt_address}/api/v1/cppki/certificates.
Note
${mgmt_address}
needs to be replaced with the actual management address
of the appliance.
If there is no AS certificate configured on EDGE-1
, the output will look
like:
{
"certificate_chains": []
}
The cause for a missing AS certificate could be that we have just
added EDGE-1
and forgot to configure an AS certificate, its
certificate has been deleted accidentally, or there was an issue
with automatic renewal. Automatic renewal can fail, for example,
when there has been a prolonged connectivity issue in the order
of days.
Thus, to resolve the issue, you need to add a valid AS certificate
to EDGE-1
. In general, an AS certificate needs to be
requested from one of the CAs of the local ISD. The initial certificate
is requested with an out-of-band mechanism. Afterward, the
AS certificates are periodically renewed by the appliance in a fully
automatic fashion. See Certificate/TRC Provisioning for more
details on listing, generating, and installing AS certificates.
All Expected Paths Are Alive¶
Scenario 1: Domain Misconfiguration¶
Assume that according to your SCION Gateway Routing Protocol (SGRP) setup,
you are expecting that a ping
command from the end host Endhost-1
(in the AS 1-ff00:1:1
) to the end host Endhost-2
(in the AS
1-ff00:1:2
) to work fine, but it does not. On the other hand,
running a showpaths
command towards the AS 1-ff00:1:2
actually
displays all the paths you expect to see between these two ASes.
Note
See Routing Domains for details on the routing domain configuration.
To investigate this further, a reasonable next step is to check out
the network prefixes advertised by the local SCION AS (i.e., 1-ff00:1:1
)
and the prefixes learned from the remote SCION ASes (in particular, 1-ff00:1:2
).
These prefixes are exposed by the appliance as an HTTP status page. In fact,
gateway
is the process that configures and processes everything related to
IP-in-SCION tunneling. Thus, it publishes the status page. To inspect that, you
can run:
curl ${mgmt_address}/diagnostics/sgrp
Below is an example of how the output could look like:
{
"advertise": {
"static": [
"10.8.0.2/32",
"10.8.0.5/32"
],
"dynamic": []
},
"learned": {
"dynamic": []
},
"next-hops": {}
}
In this case, no prefix from remote ASes has been learned.
If in your setup, there is no prefix learned or more generally
the set of learned and advertised prefixes does not match what
you expect, then perhaps the domain is misconfigured.
If this is the case, you naturally need to first fix the configuration,
configure the appliance with the modified configuration, and then
check the HTTP status page to confirm that the changes
appear there too. Afterward, the ping
command from Endhost-1
to Endhost-2
should start working.
Scenario 2: TRC Issue¶
In order for the appliance to join the SCION network and communicate with other nodes, it has to be configured with a set of TRCs. These TRCs build the trust anchors for verifying all of the control plane data that is exchanged in the SCION protocol. Therefore, the lack of a trusted TRC in the appliance will result in the loss of connectivity.
You can see the list of TRCs which are configured on the appliance by running the command:
curl -k https://${mgmt_address}/api/v1/cppki/trcs
Alternatively, to inspect the list in a browser, you can navigate to https://${mgmt_address}/api/v1/cppki/trcs.
If there is no TRC configured on the appliance the output will be as follow:
{
"trcs": []
}
This indicates that no TRC is configured on the appliance; thus in order to fix the issue, you need to install a valid TRC on this appliance. You can see Certificate/TRC Provisioning for more details on generating and installing a TRC.
The cause for a missing TRC could be that we have just added EDGE-1
and forgotten to configure a TRC or its TRC has been deleted accidentally.
Note that in the second case, the showpaths command might keep functioning
correctly for some time.