Troubleshooting & Runbooks

This documentation page contains runbooks for the alerts sent out by the Anapaya CA product as well as common operations that are helpful when troubleshooting.

Alerts

InfraVaultMountPointsLimit

This alert fires if Vault is reaching the limit of mount points it can handle.

According to the official documentation, Vault can handle up to ~14000 mount points. Under normal conditions, the amount of mount points should not be remotely close to the limit.

Actions

  1. Check the current amount of mounts via the /sys/mounts endpoint.

  2. Disable the old PKI engines that are no longer needed for auditing purposes following the instructions in the documentation.

InfraVaultSealed

This alert fires if the Vault instance is sealed. This can happen for example after a reboot of the machine running Vault.

Vault stores all secrets encrypted on disk. When the Vault instance is sealed, it does not have access to the decryption key and therefore cannot access the stored secrets. For a Vault instance to operate as expected, it needs to be unsealed.

Actions

  1. Follow the official documentation on how to unseal Vault via the CLI tool or the HTTP API.

SCIONVaultCACertificateExpiring

This alert fires if the Vault CA Backend for the ISD-AS in question has a CA certificate provisioned that will expire in less than 4 days. If the remaining CA certificate validity falls below 3 days, AS certificate chain renewal will not be possible anymore.

Actions

Temporary fix
  • You can temporarily lower the validity period of the issued AS certificates. This permits issuance to continue, even if the remaining CA certificate validity falls below the minimum of 3 days. To do so, reduce the default TTL in the ca/{{ isd_as }}/pki engine to a value smaller than 3 days, where isd_as is the value of the ISD-AS number for which the certificate did not renew. This can be achieved through the CLI tool or the HTTP API.

Warning

This action does not solve the source of the problem issue. It only enables AS certificates to be renewed and have a validity time that is smaller than the default value of 3 days. Once the CA certificate expires, AS certificate renewal will also stop.

Warning

This action must be manually reverted as soon as there is a new CA certificate, otherwise AS certificates will be renewed with validity periods below the recommended 3 days.

Troubleshooting
  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA certificate renewal.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate or errors.

  4. Manually trigger the CA certificate update for the relevant ISD-AS(es).

SCIONVaultCACertificateInFuture

This alert fires if the Vault CA Backend for the ISD-AS in question has a CA certificate provisioned with the NotBefore time in the future. This means that no AS certificate chains can be issued for the current moment.

This issue usually occurs when clock synchronization is off.

Actions

  1. Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.

  2. If time synchronization is off, synchronize the clock on the device.

SCIONVaultCertificateDestinguishedNameMismatch

This alert fires if the Vault CA Backend for the ISD-AS in question has a certificate provisioned where the distinguished name of the issuer or the subject belongs to a different ISD-AS. This should never happen in practice. If the alert fires, something has gone really wrong in the Vault setup and renewal process.

Actions

  1. This alert should never be triggered. Contact Anapaya Support.

SCIONVaultCANotRolledOver

This alert fires if the Vault CA Backend for the Vault instance has not been rolled over within the last 7 days.

In general, Vault instances have a periodic task that renews the CA certificate and rolls over the CA PKI engine every week. This alert shows that there was a problem either with renewing the CA certificate or with rolling over the PKI engine to the newest one.

Note

As soon as the CA certificate expiration date is less than 3 days in the future, the automatic AS certificate renewal will stop working.

Actions

  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA rollover process.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.

  4. Manually trigger the CA certificate update for the relevant ISD-AS(es).

SCIONVaultCARenewalNotOperating

This alert fires if the automatic CA certificate renewal, run by the vaultca service, is not operating.

Actions

  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA rollover process.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.

SCIONVaultRootCertificateExpiring

This alert fires if the Vault CA Backend for the ISD-AS in question has a root certificate provisioned that expires in the next 45 days. If the root certificate expires, then no CA certificate chains and eventually no AS certificate chains will be able to be issued.

There are two cases where this alert can be triggered:

  1. The TRC for the ISD was updated, but the root engine for the ISD-AS in question was not re-provisioned.

  2. The TRC is also expiring, along with the root certificate for the ISD-AS in question.

Actions

  1. If the TRC was recently updated but the root engine was not updated, then re-provision the root engine.

  2. If the TRC is also expiring, start the provisioning process for a new TRC. Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.

SCIONVaultRootCertificateInFuture

This alert fires if the Vault CA Backend for the ISD-AS in question has a root certificate provisioned with the NotBefore time in the future. This means that no CA certificate chains can be issued at the current moment.

This issue usually occurs when clock synchronization is off.

Actions

  1. Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.

  2. If time synchronization is off, synchronize the clock on the device.

SCIONVaultTRCExpiring

This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is expiring within 30 days. If the TRC expires, then the ISD-AS will not be able to validate information related to the path discovery process.

There are two cases in which this alert can be triggered:

  1. The TRC was recently updated, but it was not pushed to the secret engine storage for the ISD-AS in question.

  2. The TRC has not been recently updated and will expire soon.

Actions

  1. If the TRC was recently updated but not pushed to the secret engine, push the new TRC.

  2. If the TRC is expiring and there is no new TRC available, follow this process:

    1. Inspect the TRC to verify that it expires soon.

    2. Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.

SCIONVaultTRCInFuture

This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is provisioned with a Not_Before value in the future.

This can occur in two cases:

  1. The latest TRC is pushed to the secrets engine before the agreed time with the other voting parties for the TRC.

  2. Clock synchronization is off.

Actions

  1. If you pushed the TRC before the agreed time, you need to push the previous TRC to engine storage for the ISD-AS in question.

  2. If time synchronization is off, synchronize the clock on the device. The specific commands to use differ depending on the operating system and the setup.

Common Operations

Start the vaultca service

  1. SSH to the given machine.

  2. Run the following command:

    systemctl start vaultca.service
    
  3. Ensure that there are no errors in the output and that the status of the service is active (running).

Check whether vaultca service is running

  1. SSH to the given machine.

  2. Run the following command:

    systemctl status vaultca.service
    
  3. The output will include the status of the service, which should be active (running).

Inspect vaultca service logs

  1. SSH to the given machine.

  2. Run the following command:

    journalctl -u vaultca.service
    

    Tip

    To see only the recent logs use:

    journalctl -u vaultca.service --since=<time-duration>
    

    For example, to check the logs of the last minute, run

    journalctl -u vaultca.service --since=1m
    

    Tip

    The logs are also exposed to Loki and can be viewed with Grafana.

Manually renew CA Certificates

  1. SSH to the given machine.

  2. Run the following command. By default, the configuration file of the vaultca service is located at /etc/vaultca/config.toml. If the configuration file is located somewhere else, make sure to pass the correct path as input to the --config flag.

    vaultca renew --config=/etc/vaultca/config.toml
    

    Note

    This will update the CA Certificates for all the ISD-ASes configured on the host. To update the CA Certificate for a specific ISD-AS, you can pass the --isd-as flag as an input.

    For example, to renew the CA Certificate for ISD-AS 1-ff00:0:1, you can run:

    vaultca renew --config=/etc/vaultca/config.toml --isd-as=1-ff00:0:1
    

Inspect TRC

  1. SSH to the machine that runs the Vault instance.

  2. Get the latest version of the TRC using the vault kv get command and following the official Vault documentation. The latest TRC is stored under the path ca/<isd-as>/kv/trcs/latest.

  3. Use the scion-pki tool to inspect the TRC details.

Push new TRC

  1. SSH to the machine that runs the Vault instance.

  2. Push the new version of the TRC by following the official Vault documentation. The latest TRC is stored under the path ca/<isd-as>/kv/trcs/latest.

Provision new Root CA engine

  1. SSH to the machine that runs the Vault instance.

  2. Ensure that the Root PKI engine is enabled by following the official Vault documentation. Add the following fields in the JSON-formatted body of the request:

    • The path for the PKI engine is root/<ISD-AS>/pki.

    • The type should be marked as pki.

    • You can add an optional description such as Root PKI for <ISD-AS>.

  3. Tune the Root PKI engine with default and max lease TTL. Follow the official Vault documentation and configure the default_lease_ttl to be 264h and the max_lease_ttl to be 720h.

  4. Generate a new root key and certificate bundle.

  5. Configure Root PKI with the root key and root certificate bundle. Follow the official Vault documentation and use the root/<ISD-AS>/pki/config/ca path. Include as pem_bundle the root key and certificate generated in the previous step.