Skip to main content

Adding and removing nodes from the cluster

You may need to add or remove nodes from the cluster in a non-disruptive manner. To add a new node, you should have all the required packages installed (such as libnss-sss libpam-sss sssd sssd-tools adcli packagekit krb5-user), have the keytab and sssd.conf file distributed to it and have krb5.conf file configured as described in previous steps. Additionally, you should have pacemaker, corosync and pcs installed, and your hacluster user should have a password that matches the same system user on other nodes (recommended). Additionally pcsd service should be started and the new node registered with DNS. Verify the status of the existing cluster:

tux@Fusion-3021:~$ sudo pcs status
Cluster name: tsmb_ha
Cluster Summary:
* Stack: corosync
* Current DC: Fusion-3021.9.16-02.fusion.tuxera (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Fri Oct 29 14:04:32 2021
* Last change: Fri Oct 29 13:45:55 2021 by root via cibadmin on Fusion-3021.9.16-01.fusion.tuxera
* 2 nodes configured
* 4 resource instances configured
Node List:
* Online: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera ]
Full List of Resources:
* virtual_ip_A (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-01.fusion.tuxera
* virtual_ip_B (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-02.fusion.tuxera
* Clone Set: tsmb_ha-clone [tsmb_ha]:
* Started: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled

Verify the status of new node:

tux@Fusion-3021:~$ sudo pcs status
Cluster name: debian
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync
* Current DC: node1 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Fri Oct 29 15:42:48 2021
* Last change: Fri Oct 29 15:21:46 2021 by hacluster via crmd on node1
* 1 node configured
* 0 resource instances configured
Node List:
* Online: [ node1 ]
Full List of Resources:
* No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Since the new node is configured in a default cluster on setup, destroy the cluster (ONLY on the new node!):

tux@Fusion-3021:~$ sudo pcs cluster destroy
Shutting down pacemaker/corosync services...
Killing any remaining services...
Removing all cluster configuration files...

From an existing node (that is a member of the cluster) authorize the new node and then add the new node:

tux@Fusion-3021:~$ sudo pcs host auth -u hacluster -p password123
Fusion-3021.9.16-03.fusion.tuxera
Fusion-3021.9.16-03.fusion.tuxera: Authorized
tux@Fusion-3021:~$ sudo pcs cluster node add Fusion-3021.9.16-03.fusion.tuxera
No addresses specified for host 'Fusion-3021.9.16-03.fusion.tuxera', using 'fusion-3021.9.16-03.fusion.tuxera'
Disabling SBD service...
Fusion-3021.9.16-03.fusion.tuxera: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'Fusion-3021.9.16-03.fusion.tuxera'
Fusion-3021.9.16-03.fusion.tuxera: successful distribution of the file 'corosync authkey'
Fusion-3021.9.16-03.fusion.tuxera: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
Fusion-3021.9.16-03.fusion.tuxera: Succeeded
Fusion-3021.9.16-02.fusion.tuxera: Succeeded
Fusion-3021.9.16-01.fusion.tuxera: Succeeded
Fusion-3021.9.16-02.fusion.tuxera: Corosync configuration reloaded

After the node is added, a virtual ip resource with a location constraint should be created for the new node:

tux@Fusion-3021:~$ sudo pcs resource create virtual_ip_C ocf:heartbeat:IPaddr2 ip=10.13.0.17 cidr_netmask=32 op monitor interval=1s
tux@Fusion-3021:~$ sudo pcs constraint location virtual_ip_C prefers Fusion-3021.9.16-03.fusion.tuxera

Verify the status of the cluster noting that the IP is available, but the new node and clone resource are not:

tux@Fusion-3021:~$ sudo pcs status
Cluster name: tsmb_ha
Cluster Summary:
* Stack: corosync
* Current DC: Fusion-3021.9.16-01.fusion.tuxera (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Fri Oct 29 15:58:10 2021
* Last change: Fri Oct 29 15:56:41 2021 by root via cibadmin on Fusion-3021.9.16-02.fusion.tuxera
* 2 nodes configured
* 5 resource instances configured
Node List:
* Online: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera ]
Full List of Resources:
* virtual_ip_A (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-01.fusion.tuxera
* virtual_ip_B (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-02.fusion.tuxera
* virtual_ip_C (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-01.fusion.tuxera
* Clone Set: tsmb_ha-clone [tsmb_ha]:
* Started: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled

Then on the new node, start and enable cluster services:

tux@Fusion-3021:~$ sudo pcs cluster start
Starting Cluster...
tux@Fusion-3021:~$ sudo pcs cluster enable
tux@Fusion-3021:~$

Verify the status of the cluster noting that new node now hosts resources:

tux@Fusion-3021:~$ sudo pcs status
Cluster name: tsmb_ha
Cluster Summary:
* Stack: corosync
* Current DC: Fusion-3021.9.16-01.fusion.tuxera (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Fri Oct 29 16:03:43 2021
* Last change: Fri Oct 29 15:59:30 2021 by hacluster via crmd on Fusion-3021.9.16-01.fusion.tuxera
* 3 nodes configured
* 6 resource instances configured
Node List:
* Online: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera Fusion-3021.9.16-03.fusion.tuxera ]
Full List of Resources:
* virtual_ip_A (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-01.fusion.tuxera
* virtual_ip_B (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-02.fusion.tuxera
* virtual_ip_C (ocf::heartbeat:IPaddr2): Started Fusion-3021.9.16-03.fusion.tuxera
* Clone Set: tsmb_ha-clone [tsmb_ha]:
* Started: [ Fusion-3021.9.16-01.fusion.tuxera Fusion-3021.9.16-02.fusion.tuxera Fusion-3021.9.16-03.fusion.tuxera ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled

Verify that the SMB file shares are available on the virtual ip of the new node. Then add the new virtual IP address of the new node to round-robin DNS for client access. To remove the node, first stop services on the node:

tux@Fusion-3021:~$ sudo pcs cluster stop
[sudo] password for tux:
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...

Verify that the virtual IP has moved to another node. Then remove the constraint by first identifying the id:

tux@Fusion-3021:~$ sudo pcs constraint list --full
Location Constraints:
Resource: virtual_ip_A
Enabled on:
Node: Fusion-3021.9.16-01.fusion.tuxera (score:50) (id:location-virtual_ip_A-Fusion-3021.9.16-01.fusion.tuxera-50)
Resource: virtual_ip_B
Enabled on:
Node: Fusion-3021.9.16-02.fusion.tuxera (score:50) (id:location-virtual_ip_B-Fusion-3021.9.16-02.fusion.tuxera-50)
Resource: virtual_ip_C
Enabled on:
Node: Fusion-3021.9.16-03.fusion.tuxera (score:50) (id:location-virtual_ip_C-Fusion-3021.9.16-03.fusion.tuxera-50)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Then removing it:

tux@Fusion-3021:~$ sudo pcs constraint location remove location-virtual_ip_C-Fusion-3021.9.16-03.fusion.tuxera-50

A new constraint should be created that is associated with another new node (if available). If a new node is not available, then a constraint should be created that associates with an existing node. Until clients are no longer actively connected to that virtual IP, there is no transparent way to retire it. Finally remove the node from the cluster:

tux@Fusion-3021:~$ sudo pcs cluster node remove Fusion-3021.9.16-03.fusion.tuxera