Runbooks

These are runbooks for the SCION CA product. The runbooks have an entry for each alert rule in the recommended alert rules that we provide. The runbooks are meant to be used as a reference for the actions that should be taken when an alert fires.

InfraVaultMountPointsLimit

This alert fires if Vault is reaching the limit of mount points it can handle.

According to the official documentation, Vault can handle up to ~14000 mount points. Under normal conditions, the amount of mount points should not be remotely close to the limit.

Actions

  1. Check the current amount of mounts via the /sys/mounts endpoint.

  2. Disable the old PKI engines that are no longer needed for auditing purposes following the instructions in the documentation.

InfraVaultSealed

This alert fires if the Vault instance is sealed. This can happen for example after a reboot of the machine running Vault.

Vault stores all secrets encrypted on disk. When the Vault instance is sealed, it does not have access to the decryption key and therefore cannot access the stored secrets. For a Vault instance to operate as expected, it needs to be unsealed.

Actions

  1. Follow the official documentation on how to unseal Vault via the CLI or the HTTP API.

SCIONVaultCACertificateExpiring

This alert fires if the Vault CA Backend for the ISD-AS in question has a CA certificate provisioned that will expire in less than 4 days. If the remaining CA certificate validity falls below 3 days, AS certificate chain renewal will not be possible anymore.

Actions

Temporary fix

  • You can temporarily lower the validity period of the issued AS certificates. This permits issuance to continue, even if the remaining CA certificate validity falls below the minimum of 3 days. To do so, reduce the default TTL in the ca/{{ isd_as }}/pki engine to a value smaller than 3 days, where isd_as is the value of the ISD-AS number for which the certificate did not renew. This can be achieved through the CLI tool or the HTTP API.

    Warning

    This action does not solve the source of the problem issue. It only enables AS certificates to be renewed and have a validity time that is smaller than the default value of 3 days. Once the CA certificate expires, AS certificate renewal will also stop.

    Warning

    This action must be manually reverted as soon as there is a new CA certificate, otherwise AS certificates will be renewed with validity periods below the recommended 3 days.

Troubleshooting

  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA certificate renewal.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate or errors.

  4. Manually trigger the CA certificate update for the relevant ISD-AS(es).

SCIONVaultCACertificateInFuture

This alert fires if the Vault CA Backend for the ISD-AS in question has a CA certificate provisioned with the NotBefore time in the future. This means that no AS certificate chains can be issued for the current moment.

This issue usually occurs when clock synchronization is off.

Actions

  1. Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.

  2. If time synchronization is off, synchronize the clock on the device.

SCIONVaultCertificateDestinguishedNameMismatch

This alert fires if the Vault CA Backend for the ISD-AS in question has a certificate provisioned where the distinguished name of the issuer or the subject belongs to a different ISD-AS. This should never happen in practice. If the alert fires, something has gone really wrong in the Vault setup and renewal process.

Actions

  1. This alert should never be triggered. Contact Anapaya Support.

SCIONVaultCANotRolledOver

This alert fires if the Vault CA Backend for the Vault instance has not been rolled over within the last 7 days.

In general, Vault instances have a periodic task that renews the CA certificate and rolls over the CA PKI engine every week. This alert shows that there was a problem either with renewing the CA certificate or with rolling over the PKI engine to the newest one.

Warning

As soon as the CA certificate expiration date is less than 3 days in the future, the automatic AS certificate renewal will stop working.

Actions

  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA rollover process.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.

  4. Manually trigger the CA certificate update for the relevant ISD-AS(es).

SCIONVaultCARenewalNotOperating

This alert fires if the automatic CA certificate renewal, run by the vaultca service, is not operating.

Actions

  1. Check if the vaultca service is running.

  2. If the vaultca service is running, check if the logs have any errors for the CA rollover process.

  3. If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.

SCIONVaultRootCertificateExpiring

This alert fires if the Vault CA Backend for the ISD-AS in question has a root certificate provisioned that expires in the next 45 days. If the root certificate expires, then no CA certificate chains and eventually no AS certificate chains will be able to be issued.

There are two cases where this alert can be triggered:

  1. The TRC for the ISD was updated, but the root engine for the ISD-AS in question was not re-provisioned.

  2. The TRC is also expiring, along with the root certificate for the ISD-AS in question.

Actions

  1. If the TRC was recently updated but the root engine was not updated, then re-provision the root engine.

  2. If the TRC is also expiring, start the provisioning process for a new TRC. Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.

SCIONVaultRootCertificateInFuture

This alert fires if the Vault CA Backend for the ISD-AS in question has a root certificate provisioned with the NotBefore time in the future. This means that no CA certificate chains can be issued at the current moment.

This issue usually occurs when clock synchronization is off.

Actions

  1. Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.

  2. If time synchronization is off, synchronize the clock on the device.

SCIONVaultTRCExpiring

This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is expiring within 30 days. If the TRC expires, then the ISD-AS will not be able to validate information related to the path discovery process.

There are two cases in which this alert can be triggered:

  1. The TRC was recently updated, but it was not pushed to the secret engine storage for the ISD-AS in question.

  2. The TRC has not been recently updated and will expire soon.

Actions

  1. If the TRC was recently updated but not pushed to the secret engine, push the new TRC.

  2. If the TRC is expiring and there is no new TRC available, follow this process:

    1. Inspect the TRC to verify that it expires soon.

    2. Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.

SCIONVaultTRCInFuture

This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is provisioned with a Not_Before value in the future.

This can occur in two cases:

  1. The latest TRC is pushed to the secrets engine before the agreed time with the other voting parties for the TRC.

  2. Clock synchronization is off.

Actions

  1. If you pushed the TRC before the agreed time, you need to push the previous TRC to engine storage for the ISD-AS in question.

  2. If time synchronization is off, synchronize the clock on the device. The specific commands to use differ depending on the operating system and the setup.