Common Operations¶
This documentation page contains information for common operations that are helpful when troubleshooting.
Gather appliance information¶
To collect appliance-related information to provide it to the Anapaya Customer Support:
SSH to the given machine.
Collect general information by running:
appliance-cli info > appliance.info
Fetch the appliance configuration by running:
appliance-cli get config > config.json
Warning
The appliance config contains secrets, so please remove them before sending the information to anyone!
Collect appliance debug dump¶
A debug dump is a compressed journald-log of the last hour. Among other things, it contains snapshots of metrics and appliance API-endpoints taken at regular intervals. A debug dump should always be included when filing a bug report, e.g.
The following command takes a debug dump and stores the result in debugdump.zst:
appliance-cli debug dump -o debugdump.zst
Note
If the HTTP-API of the appliance is not properly configured or not reachable, you can specify the –use-journalctl option. Specifying this option bypasses the appliance API and makes use of journalctl directly.
Gather general host information¶
To collect host-related information to provide it to the Anapaya Customer Support:
SSH to the given machine.
Run
sudo lshw
Check docker services¶
To check whether the services (run as docker containers) are running:
SSH to the given machine
Use
docker ps -a
:$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c718397beaf9 scion-all:v0.32.2 "/app/scion-all netw…" 7 days ago Up 7 days dataplane-control 5beecfb5d081 vpp-dataplane:v0.32.2 "/usr/bin/vpp -c /sh…" 7 days ago Up 7 days dataplane ...
The output of the command shows whether the service is up and for how long it has been running. If the service is up for a very short amount of time, there is a chance that it is crashlooping.
For further information please refer to the official Docker documentation.
Change log level¶
To change the log level to debug to gather more information when investigating an issue:
SSH to the given machine
Run the following command to change the debug level of a specific service to debug.
appliance-cli services log level <service-name> debug
Warning
Don’t forget to revert your changes after troubleshooting.
Inspect docker service logs¶
To inspect the logs of services running as docker containers:
SSH to the given machine.
If needed, you can use the following command to see the list of services:
docker ps -a
Inspect the logs by running the following command:
docker logs <service-name>
Note
To see only the recent logs use:
docker logs <service-name> --since=<time-duration>
For example, to check the logs of the last minute, run:
docker logs <service-name> --since=1m
Note
The logs are printed to stderr
.
To save the logs in a file use:
docker logs <service-name> 2> <filename>
To grep through the logs use
docker logs <service-name> 2>&1 | grep <query>
For further information please refer to the official Docker documentation.
Restart a service¶
To restart a service you can use the appliance-cli
:
appliance-cli post debug/services/${service_name}/restart
where ${service_name}
is the name of the service you want to restart.
To get the possible values for the ${service_name}
, use the following command:
appliance-cli get debug/services
Note
Alternatively, you can restart a service by running the following commands:
SSH to the given machine
Run
docker restart <service-name>
Note
The Anapaya appliance restarts failed services automatically, so manual restarting is likely to be useful only when the service is stuck and/or unresponsive.
For further information, please refer to the official Docker documentation.
Clean up docker images¶
To remove docker images that are no longer used:
SSH to the given machine.
List all docker images by running:
docker image ls
Remove old unused images by running:
docker image prune
For further information please refer to the official Docker documentation.
Connect to the BGP daemon’s interactive console¶
To connect to the BGP daemon’s shell:
SSH to the given machine.
Open the interactive console by running:
docker exec -it frr vtysh
For further information on the console please refer to the official FRR documentation.
Check systemd services¶
To check if the systemd services are running:
SSH to the given machine
Run
systemctl list-units '<service-name>'
:$ systemctl list-units 'appliance*' UNIT LOAD ACTIVE SUB DESCRIPTION appliance-host.service loaded active running Anapaya Appliance Host Service appliance-installer.service loaded active running Anapaya Appliance Installer ...
To get a more detailed overview of a specific service, use
systemctl status <service-name>
:$ systemctl status appliance-installer.service ● appliance-installer.service - Anapaya Appliance Installer Loaded: loaded (/etc/systemd/system/appliance-installer.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2023-02-13 08:26:29 UTC; 7min ago Main PID: 166 (appliance-insta) Tasks: 13 (limit: 38262) CGroup: /system.slice/appliance-installer.service └─166 /usr/bin/appliance-installer --config /etc/anapaya/installer/appliance-installer.toml ...
With these commands you can see whether the service is active and running and for how long it has been running.
Note
A systemd service can be restarted using systemctl restart <service-name>
.
Inspect systemd service logs¶
To view the systemd service logs:
SSH to the given machine.
If needed, you can list the appliance-related services by running:
systemctl list-units 'appliance*'
Inspect the logs by running:
journalctl -eu <service-name>
Note
To see only the recent logs use the --since
flag. For example, to see
only the logs from today use journalctl -eu <service-name> --since today
.
To show the most recent 20 entries, use the -n 20
option.
Note
Note that the logs are printed to stdout
.
To save the logs in a file use:
journalctl -u <service-name> > <filename>
To grep through the logs use:
journalctl -eu <service-name> | grep <query>
Check the systemd-timesyncd service¶
We use the systemd-timesyncd
service, which acts as an NTP client and
connects to a pool of NTP servers for time synchronization. The following
actions provide some starting points for troubleshooting the timesyncd service
called systemd-timesyncd.service
. For further information please refer to
the official documentation.
SSH to the given machine.
Check if the system clock is synchronized and if NTP service is active using:
timedatectl status
Restart the service:
systemctl restart systemd-timesyncd.service
Find the configured NTP servers:
cat /etc/systemd/timesyncd.conf | grep NTP
Disk usage analysis¶
This section contains some helpful commands that you may need when investigating if you run out of disk space.
SSH to the given machine.
Check the current space:
df -h <path>
Check the list of the current files:
ls -l <path>
The
du
command can be used to get a more detailed overview of which directory consumes how much space. You can vary themax-depth
option or the starting directory:du -cha --max-depth=1 / | grep -E "M|G"
For further information about the du
command please refer to the official
documentation.
Clean up disk space¶
There are several ways to free up disk space. The options are divided depending on the context.
Systemd journal logs¶
Check systemd journal logs:
journalctl --disk-usage
Clear the logs that are older than 3 days:
sudo journalctl --vacuum-time=3d
Docker images¶
Fix topology synchronization error¶
Appliances in a cluster share their topology information with each other. This either happens statically through configuration or dynamically through an exchange protocol. For further information on how to configure topology synchronization in the appliance configuration, refer to Topology Synchronization. The instructions below should help to identify a misconfiguration.
Check the logs of the appliance-controller service. The logs should contain an error describing the misconfiguration.
Fix the misconfigured appliances and update them.
Inspect SCION paths used for IP-in-SCION tunneling¶
While troubleshooting SCION connectivity, it is often useful to check the available paths for each domain. This section provides an overview on how to achieve this.
SSH to the given machine.
Show the currently available paths for all domains and traffic matchers by running the following command. This also shows whether the path is alive, dead (no probes are passing through), expired or similar.
appliance-cli inspect scion-tunneling summary --all-paths
Show the currently used paths for a specific domain.
appliance-cli inspect scion-tunneling summary --all-paths \ --domain <domain>
For the used paths for a specific traffic matcher within the given domain, run:
appliance-cli inspect scion-tunneling summary --all-paths \ --domain <domain> --traffic-matcher <traffic matcher>
Ping the underlay network¶
When investigating an issue, it is often helpful to determine whether the underlying IP connectivity is the problem.
For further information, please refer to the official ping documentation.
Tip
The ping command runs indefinitely, unless specified otherwise:
ping -c <number> <destination>
Changing the source address is possible either directly via the address or the interface name:
ping <destination> -I <interface/address>
The default time interval between successive packet transmissions is one second. You can specify a custom interval in seconds:
ping -i <interval> <destination>