Common Operations

This documentation page contains information for common operations that are helpful when troubleshooting.

Gather appliance information

To collect appliance-related information to provide it to the Anapaya Customer Support:

  1. SSH to the given machine.

  2. Collect general information by running:

    appliance-cli info > appliance.info
    
  3. Fetch the appliance configuration by running:

    appliance-cli get config > config.json
    

Warning

The appliance config contains secrets, so please remove them before sending the information to anyone!

Collect appliance debug dump

A debug dump is a compressed journald-log of the last hour. Among other things, it contains snapshots of metrics and appliance API-endpoints taken at regular intervals. A debug dump should always be included when filing a bug report, e.g.

The following command takes a debug dump and stores the result in debugdump.zst:

appliance-cli debug dump -o debugdump.zst

Note

If the HTTP-API of the appliance is not properly configured or not reachable, you can specify the –use-journalctl option. Specifying this option bypasses the appliance API and makes use of journalctl directly.

Gather general host information

To collect host-related information to provide it to the Anapaya Customer Support:

  1. SSH to the given machine.

  2. Run

    sudo lshw
    

Check docker services

To check whether the services (run as docker containers) are running:

  1. SSH to the given machine

  2. Use docker ps -a:

    $ docker ps -a
    CONTAINER ID  IMAGE                  COMMAND                 CREATED     STATUS     PORTS  NAMES
    c718397beaf9  scion-all:v0.32.2      "/app/scion-all netw…"  7 days ago  Up 7 days         dataplane-control
    5beecfb5d081  vpp-dataplane:v0.32.2  "/usr/bin/vpp -c /sh…"  7 days ago  Up 7 days         dataplane
    ...
    

The output of the command shows whether the service is up and for how long it has been running. If the service is up for a very short amount of time, there is a chance that it is crashlooping.

For further information please refer to the official Docker documentation.

Change log level

To change the log level to debug to gather more information when investigating an issue:

  1. SSH to the given machine

  2. Run the following command to change the debug level of a specific service to debug.

    appliance-cli services log level <service-name> debug
    

Warning

Don’t forget to revert your changes after troubleshooting.

Inspect docker service logs

To inspect the logs of services running as docker containers:

  1. SSH to the given machine.

  2. If needed, you can use the following command to see the list of services:

    docker ps -a
    
  3. Inspect the logs by running the following command:

    docker logs <service-name>
    

Note

To see only the recent logs use:

docker logs <service-name> --since=<time-duration>

For example, to check the logs of the last minute, run:

docker logs <service-name> --since=1m

Note

The logs are printed to stderr.

To save the logs in a file use:

docker logs <service-name> 2> <filename>

To grep through the logs use

docker logs <service-name> 2>&1 | grep <query>

For further information please refer to the official Docker documentation.

Restart a service

To restart a service you can use the appliance-cli:

appliance-cli post debug/services/${service_name}/restart

where ${service_name} is the name of the service you want to restart. To get the possible values for the ${service_name}, use the following command:

appliance-cli get debug/services

Note

Alternatively, you can restart a service by running the following commands:

  1. SSH to the given machine

  2. Run docker restart <service-name>

Note

The Anapaya appliance restarts failed services automatically, so manual restarting is likely to be useful only when the service is stuck and/or unresponsive.

For further information, please refer to the official Docker documentation.

Clean up docker images

To remove docker images that are no longer used:

  1. SSH to the given machine.

  2. List all docker images by running:

    docker image ls
    
  3. Remove old unused images by running:

    docker image prune
    

For further information please refer to the official Docker documentation.

Connect to the BGP daemon’s interactive console

To connect to the BGP daemon’s shell:

  1. SSH to the given machine.

  2. Open the interactive console by running:

    docker exec -it frr vtysh
    

For further information on the console please refer to the official FRR documentation.

Check systemd services

To check if the systemd services are running:

  1. SSH to the given machine

  2. Run systemctl list-units '<service-name>':

    $ systemctl list-units 'appliance*'
    UNIT                        LOAD   ACTIVE SUB     DESCRIPTION
    appliance-host.service      loaded active running Anapaya Appliance Host Service
    appliance-installer.service loaded active running Anapaya Appliance Installer
    ...
    
  3. To get a more detailed overview of a specific service, use systemctl status <service-name>:

    $ systemctl status appliance-installer.service
           appliance-installer.service - Anapaya Appliance Installer
       Loaded: loaded (/etc/systemd/system/appliance-installer.service; enabled; vendor preset: enabled)
       Active: active (running) since Mon 2023-02-13 08:26:29 UTC; 7min ago
    Main PID: 166 (appliance-insta)
       Tasks: 13 (limit: 38262)
       CGroup: /system.slice/appliance-installer.service
             └─166 /usr/bin/appliance-installer --config /etc/anapaya/installer/appliance-installer.toml
    ...
    

With these commands you can see whether the service is active and running and for how long it has been running.

Note

A systemd service can be restarted using systemctl restart <service-name>.

Inspect systemd service logs

To view the systemd service logs:

  1. SSH to the given machine.

  2. If needed, you can list the appliance-related services by running:

    systemctl list-units 'appliance*'
    
  3. Inspect the logs by running:

    journalctl -eu <service-name>
    

Note

To see only the recent logs use the --since flag. For example, to see only the logs from today use journalctl -eu <service-name> --since today.

To show the most recent 20 entries, use the -n 20 option.

Note

Note that the logs are printed to stdout.

To save the logs in a file use:

journalctl -u <service-name> > <filename>

To grep through the logs use:

journalctl -eu <service-name> | grep <query>

Check the systemd-timesyncd service

We use the systemd-timesyncd service, which acts as an NTP client and connects to a pool of NTP servers for time synchronization. The following actions provide some starting points for troubleshooting the timesyncd service called systemd-timesyncd.service. For further information please refer to the official documentation.

  1. SSH to the given machine.

  2. Check if the system clock is synchronized and if NTP service is active using:

    timedatectl status
    
  3. Check if the service is running.

  4. Check the log of the service.

  5. Restart the service:

    systemctl restart systemd-timesyncd.service
    
  6. Find the configured NTP servers:

    cat /etc/systemd/timesyncd.conf | grep NTP
    

Disk usage analysis

This section contains some helpful commands that you may need when investigating if you run out of disk space.

  1. SSH to the given machine.

  2. Check the current space:

    df -h <path>
    
  3. Check the list of the current files:

    ls -l <path>
    
  4. The du command can be used to get a more detailed overview of which directory consumes how much space. You can vary the max-depth option or the starting directory:

    du -cha --max-depth=1 / | grep -E "M|G"
    

For further information about the du command please refer to the official documentation.

Clean up disk space

There are several ways to free up disk space. The options are divided depending on the context.

Systemd journal logs

  1. Check systemd journal logs:

    journalctl --disk-usage
    
  2. Clear the logs that are older than 3 days:

    sudo journalctl --vacuum-time=3d
    

Docker images

Fix topology synchronization error

Appliances in a cluster share their topology information with each other. This either happens statically through configuration or dynamically through an exchange protocol. For further information on how to configure topology synchronization in the appliance configuration, refer to Topology Synchronization. The instructions below should help to identify a misconfiguration.

  1. Check the logs of the appliance-controller service. The logs should contain an error describing the misconfiguration.

  2. Fix the misconfigured appliances and update them.

Inspect SCION paths used for IP-in-SCION tunneling

While troubleshooting SCION connectivity, it is often useful to check the available paths for each domain. This section provides an overview on how to achieve this.

  1. SSH to the given machine.

  2. Show the currently available paths for all domains and traffic matchers by running the following command. This also shows whether the path is alive, dead (no probes are passing through), expired or similar.

    appliance-cli inspect scion-tunneling summary --all-paths
    
  3. Show the currently used paths for a specific domain.

    appliance-cli inspect scion-tunneling summary --all-paths \
      --domain <domain>
    

    For the used paths for a specific traffic matcher within the given domain, run:

    appliance-cli inspect scion-tunneling summary --all-paths \
      --domain <domain> --traffic-matcher <traffic matcher>
    

Ping the underlay network

When investigating an issue, it is often helpful to determine whether the underlying IP connectivity is the problem.

For further information, please refer to the official ping documentation.

Tip

The ping command runs indefinitely, unless specified otherwise:

ping -c <number> <destination>

Changing the source address is possible either directly via the address or the interface name:

ping <destination> -I <interface/address>

The default time interval between successive packet transmissions is one second. You can specify a custom interval in seconds:

ping -i <interval> <destination>