etcd icon

Remove a master node from a HA Kubernetes cluster

VirgilPosted by

Few weeks ago one the master nodes of one of my cluster has become unstable. I got an error when I have tried to replace it, so I have decided to make an article about it.

Starting point

As you can see, I have 3 master nodes and we will see how to replace master3.

[root@master1]$ kubectl get node  --selector='node-role.kubernetes.io/master' -o name
node/master1
node/master2
node/master3

I have also 3 etcd pods, one by master node

[root@master1]$ kubectl -n kube-system get pods -l component=etcd -o name
pod/etcd-master1
pod/etcd-master2
pod/etcd-master3

Remove the node

We will remove the master3 node from the cluster

kubectl delete master3

Clean etcd

This part is the tricky part, if you try to add to replace your node without removing it from etcd before, it won’t work. Lets enter in the master1 etcd pod.

kubectl -n kube-system exec -ti etcd-master1 -- sh

If you list the members of the etcd cluster, master3 is still present.

# etcdctl -w table member list --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key

+------------------+---------+---------+--------------------------+--------------------------+------------+
|        ID        | STATUS  |  NAME   |        PEER ADDRS        |       CLIENT ADDRS       | IS LEARNER |
+------------------+---------+---------+--------------------------+--------------------------+------------+
| 2ed5356f21c6cb3f | started | master1 | https://192.168.0.1:2380 | https://192.168.0.1:2379 |      false |
| 4dc5b3633fbc1106 | started | master2 | https://192.168.0.2:2380 | https://192.168.0.2:2379 |      false |
| d1a4440f2bc1471e | started | master3 | https://192.168.0.3:2380 | https://192.168.0.3:2379 |      false |
+------------------+---------+---------+--------------------------+--------------------------+------------+

Now we can remove the master3 node using its ID.

etcdctl member remove d1a4440f2bc1471e --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key

It has been removed from the cluster

# etcdctl -w table member list --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key

+------------------+---------+---------+--------------------------+--------------------------+------------+
|        ID        | STATUS  |  NAME   |        PEER ADDRS        |       CLIENT ADDRS       | IS LEARNER |
+------------------+---------+---------+--------------------------+--------------------------+------------+
| 2ed5356f21c6cb3f | started | master1 | https://192.168.0.1:2380 | https://192.168.0.1:2379 |      false |
| 4dc5b3633fbc1106 | started | master2 | https://192.168.0.2:2380 | https://192.168.0.2:2379 |      false |
+------------------+---------+---------+--------------------------+--------------------------+------------+

Conclusion

You can now add your master node again to the cluster. If you haven’t reinstalled it, don’t forget the remove all the configuration before and to deploy the cluster keys again.

To reset the node, from master3

kubeadm reset -f

Leave a Reply

Your email address will not be published. Required fields are marked *