Troubleshooting & Runbooks¶
This documentation page contains runbooks for the alerts sent out by the Anapaya CA product as well as common operations that are helpful when troubleshooting.
Alerts¶
InfraVaultMountPointsLimit¶
This alert fires if Vault is reaching the limit of mount points it can handle.
According to the official documentation, Vault can handle up to ~14000 mount points. Under normal conditions, the amount of mount points should not be remotely close to the limit.
Actions¶
Check the current amount of mounts via the /sys/mounts endpoint.
Disable the old PKI engines that are no longer needed for auditing purposes following the instructions in the documentation.
InfraVaultSealed¶
This alert fires if the Vault instance is sealed. This can happen for example after a reboot of the machine running Vault.
Vault stores all secrets encrypted on disk. When the Vault instance is sealed, it does not have access to the decryption key and therefore cannot access the stored secrets. For a Vault instance to operate as expected, it needs to be unsealed.
Actions¶
SCIONVaultCACertificateExpiring¶
This alert fires if the Vault CA Backend for the ISD-AS in question has a CA certificate provisioned that will expire in less than 4 days. If the remaining CA certificate validity falls below 3 days, AS certificate chain renewal will not be possible anymore.
Actions¶
Temporary fix¶
You can temporarily lower the validity period of the issued AS certificates. This permits issuance to continue, even if the remaining CA certificate validity falls below the minimum of 3 days. To do so, reduce the default TTL in the
ca/{{ isd_as }}/pki
engine to a value smaller than 3 days, whereisd_as
is the value of the ISD-AS number for which the certificate did not renew. This can be achieved through the CLI tool or the HTTP API.
Warning
This action does not solve the source of the problem issue. It only enables AS certificates to be renewed and have a validity time that is smaller than the default value of 3 days. Once the CA certificate expires, AS certificate renewal will also stop.
Warning
This action must be manually reverted as soon as there is a new CA certificate, otherwise AS certificates will be renewed with validity periods below the recommended 3 days.
Troubleshooting¶
Check if the vaultca service is running.
If the
vaultca service
is running, check if the logs have any errors for the CA certificate renewal.If the vaultca service is not running, start the vaultca service and check the logs for the details of the renewed CA certificate or errors.
Manually trigger the CA certificate update for the relevant ISD-AS(es).
SCIONVaultCACertificateInFuture¶
This alert fires if the Vault CA Backend for the ISD-AS in question has a CA
certificate provisioned with the NotBefore
time in the future. This means
that no AS certificate chains can be issued for the current moment.
This issue usually occurs when clock synchronization is off.
Actions¶
Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.
If time synchronization is off, synchronize the clock on the device.
SCIONVaultCertificateDestinguishedNameMismatch¶
This alert fires if the Vault CA Backend for the ISD-AS in question has a certificate provisioned where the distinguished name of the issuer or the subject belongs to a different ISD-AS. This should never happen in practice. If the alert fires, something has gone really wrong in the Vault setup and renewal process.
Actions¶
This alert should never be triggered. Contact Anapaya Support.
SCIONVaultCANotRolledOver¶
This alert fires if the Vault CA Backend for the Vault instance has not been rolled over within the last 7 days.
In general, Vault instances have a periodic task that renews the CA certificate and rolls over the CA PKI engine every week. This alert shows that there was a problem either with renewing the CA certificate or with rolling over the PKI engine to the newest one.
Note
As soon as the CA certificate expiration date is less than 3 days in the future, the automatic AS certificate renewal will stop working.
Actions¶
Check if the vaultca service is running.
If the
vaultca service
is running, check if the logs have any errors for the CA rollover process.If the
vaultca service
is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.Manually trigger the CA certificate update for the relevant ISD-AS(es).
SCIONVaultCARenewalNotOperating¶
This alert fires if the automatic CA certificate renewal, run by the
vaultca service
, is not operating.
Actions¶
Check if the vaultca service is running.
If the
vaultca service
is running, check if the logs have any errors for the CA rollover process.If the
vaultca service
is not running, start the vaultca service and check the logs for the details of the renewed CA certificate, the CA rollover or errors.
SCIONVaultRootCertificateExpiring¶
This alert fires if the Vault CA Backend for the ISD-AS in question has a root certificate provisioned that expires in the next 45 days. If the root certificate expires, then no CA certificate chains and eventually no AS certificate chains will be able to be issued.
There are two cases where this alert can be triggered:
The TRC for the ISD was updated, but the root engine for the ISD-AS in question was not re-provisioned.
The TRC is also expiring, along with the root certificate for the ISD-AS in question.
Actions¶
If the TRC was recently updated but the root engine was not updated, then re-provision the root engine.
If the TRC is also expiring, start the provisioning process for a new TRC. Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.
SCIONVaultRootCertificateInFuture¶
This alert fires if the Vault CA Backend for the ISD-AS in question has a root
certificate provisioned with the NotBefore
time in the future. This means
that no CA certificate chains can be issued at the current moment.
This issue usually occurs when clock synchronization is off.
Actions¶
Check if the time synchronization of the clocks is working as expected. The specific commands to use differ depending on the operating system and the setup.
If time synchronization is off, synchronize the clock on the device.
SCIONVaultTRCExpiring¶
This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is expiring within 30 days. If the TRC expires, then the ISD-AS will not be able to validate information related to the path discovery process.
There are two cases in which this alert can be triggered:
The TRC was recently updated, but it was not pushed to the secret engine storage for the ISD-AS in question.
The TRC has not been recently updated and will expire soon.
Actions¶
If the TRC was recently updated but not pushed to the secret engine, push the new TRC.
If the TRC is expiring and there is no new TRC available, follow this process:
Inspect the TRC to verify that it expires soon.
Initiate the TRC update process with a quorum of voting members. The details of this process depend on the governance rules of your respective ISD.
SCIONVaultTRCInFuture¶
This alert fires if the TRC stored in the Vault CA Backend for the ISD-AS in question is
provisioned with a Not_Before
value in the future.
This can occur in two cases:
The latest TRC is pushed to the secrets engine before the agreed time with the other voting parties for the TRC.
Clock synchronization is off.
Actions¶
If you pushed the TRC before the agreed time, you need to push the previous TRC to engine storage for the ISD-AS in question.
If time synchronization is off, synchronize the clock on the device. The specific commands to use differ depending on the operating system and the setup.
Common Operations¶
Start the vaultca service¶
SSH to the given machine.
Run the following command:
systemctl start vaultca.service
Ensure that there are no errors in the output and that the status of the service is
active (running)
.
Check whether vaultca service is running¶
SSH to the given machine.
Run the following command:
systemctl status vaultca.service
The output will include the status of the service, which should be
active (running)
.
Inspect vaultca service logs¶
SSH to the given machine.
Run the following command:
journalctl -u vaultca.service
Tip
To see only the recent logs use:
journalctl -u vaultca.service --since=<time-duration>
For example, to check the logs of the last minute, run
journalctl -u vaultca.service --since=1m
Tip
The logs are also exposed to Loki and can be viewed with Grafana.
Manually renew CA Certificates¶
SSH to the given machine.
Run the following command. By default, the configuration file of the
vaultca
service is located at/etc/vaultca/config.toml
. If the configuration file is located somewhere else, make sure to pass the correct path as input to the--config
flag.vaultca renew --config=/etc/vaultca/config.toml
Note
This will update the CA Certificates for all the ISD-ASes configured on the host. To update the CA Certificate for a specific ISD-AS, you can pass the
--isd-as
flag as an input.For example, to renew the CA Certificate for ISD-AS
1-ff00:0:1
, you can run:vaultca renew --config=/etc/vaultca/config.toml --isd-as=1-ff00:0:1
Inspect TRC¶
SSH to the machine that runs the Vault instance.
Get the latest version of the TRC using the
vault kv get
command and following the official Vault documentation. The latest TRC is stored under the pathca/<isd-as>/kv/trcs/latest
.Use the scion-pki tool to inspect the TRC details.
Push new TRC¶
SSH to the machine that runs the Vault instance.
Push the new version of the TRC by following the official Vault documentation. The latest TRC is stored under the path
ca/<isd-as>/kv/trcs/latest
.
Provision new Root CA engine¶
SSH to the machine that runs the Vault instance.
Ensure that the Root PKI engine is enabled by following the official Vault documentation. Add the following fields in the JSON-formatted body of the request:
The path for the PKI engine is
root/<ISD-AS>/pki
.The type should be marked as
pki
.You can add an optional description such as
Root PKI for <ISD-AS>
.
Tune the Root PKI engine with default and max lease TTL. Follow the official Vault documentation and configure the
default_lease_ttl
to be264h
and themax_lease_ttl
to be720h
.Generate a new root key and certificate bundle.
Configure Root PKI with the root key and root certificate bundle. Follow the official Vault documentation and use the
root/<ISD-AS>/pki/config/ca
path. Include aspem_bundle
the root key and certificate generated in the previous step.