How to solve certificate expired issue in Rancher OS

Problem

The rancher web application UI became inaccessible. Investigation into the logs shows that the certificate has expired

time="2021-12-29T08:27:32.616638402Z" level=info msg="Waiting for master node startup: resource name may not be empty" 2021-12-29 08:27:32.985756 I | http: TLS handshake error from 127.0.0.1:35568: remote error: tls: bad certificate time="2021-12-29T08:27:32.987398748Z" level=error msg="server https://127.0.0.1:6443/cacerts is not trusted: Get https://127.0.0.1:6443/cacerts: x509: certificate has expired or is not yet valid" 2021-12-29 08:27:32.987447 I | http: TLS handshake error from 127.0.0.1:35570: remote error: tls: bad certificate time="2021-12-29T08:27:33.620623487Z" level=info msg="Waiting for master node startup: resource name may not be empty" 2021/12/29 08:27:34 [INFO] Waiting for server to become available: Get "https://127.0.0.1:6443/version?timeout=15m0s": x509: certificate has expired or is not yet valid: current time 2021-12-29T08:27:34Z is after 2021-12-28T06:35:39

Solution

  1. SSH into the RancherOS Server on AWS
$ ssh -i /path/to/secure/key.pem rancher@<rancher-domain>
  1. Check that the date and time on the server is correct. Change to correct date if incorrect
[rancher@<rancher-domain> ~]$ date
Fri Feb 11 17:32:28 UTC 2022
  • If the date shown is incorrect, then change the date to the correct date by using the following command
[rancher@<rancher-domain> ~]$ sudo date --set="2022-02-11 17:32:28"
  1. Ensure that the rancher docker image is running
[rancher@<rancher-domain> ~]$ docker ps
CONTAINER ID        IMAGE                    COMMAND             CREATED             STATUS              PORTS                                      NAMES
480a9e0df709        rancher/rancher:latest   "entrypoint.sh"     1 month ago       Up 6 hours          0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   rancher
  1. Login to the rancher container (using container id from #3)
[rancher@<rancher-domain> ~]$ docker exec -it rancher /bin/sh
  1. Delete the old certificates by running the following commands
sh-4.4# kubectl --insecure-skip-tls-verify -n kube-system delete secrets k3s-serving
sh-4.4# kubectl --insecure-skip-tls-verify delete secret serving-cert -n cattle-system
sh-4.4# rm -f /var/lib/rancher/k3s/server/tls/dynamic-cert.json
  1. Refresh the rancher certificates
sh-4.4# curl --insecure -sfL https://localhost:8443/v3
  1. Exit the docker container
sh-4.4# exit
exit
  1. Restart the docker container
[rancher@<rancher-domain> ~]$ docker restart rancher

Once the container is restarted, then the web interface should become available again.