Active-Active Cluster for Scale-Out and High Availability
Introduction
This guide explains how to configure Fusion File Share Server in an active-active cluster. This configuration consists of two Fusion File Share Server installations with two separate server nodes, both active and serving the same SMB shares, allowing for uninterrupted operation if any of the nodes fail.
What You'll Learn in This Guide
By reading and following the instructions in this guide, you'll learn:
-
The Concept of an Active-Active Fusion File Share Server Cluster and its Benefits:
We will provide you with the necessary information to assess whether an active-active cluster is the right choice for your use case, as well as the benefits it offers your organization.
-
How to Configure an Active-Active Fusion File Share Server Cluster:
Step-by-step instructions will guide you through the process of setting up an active-active Fusion File Share Server cluster. This include setting up the servers, configuring essential settings, and preparing your cluster for evaluation and testing. You'll gain practical knowledge on how to get your cluster up and running quickly and efficiently.
-
How to Test and Validate Your Active-Active Fusion File Share Server Cluster:
Make sure your system is highly available and fault tolerant, by testing various failure modes and validating that the service is not interrupted. You will be able to assess the suitability of the installation for your production environment.
-
Optimizing Your Cluster's Performance:
Learn how to optimize your Fusion File Share Server cluster for better performance, without sacrificing availability and uptime. We will provide you with tips and best practices to tune your cluster and ensure it is running at its best.
The Use Case Described in this Guide
This guide provides steps for quickly setting up an active-active Fusion File Share Server cluster for continuous availability. It describes a simple use case, however can be easily adapted to your specific requirements.
The described use case is comprised of the following:
-
Two Server Nodes:
The cluster consists of two server nodes, running Ubuntu 22.04. The choice of operating system is arbitrary and can be replaced with your preferred distribution, although some commands (e.g., package installation) may differ. The number of nodes can be increased to provide additional redundancy and network capacity.
-
Active Directory Integration:
The cluster in this example is integrated with Active Directory for user authentication, and the Fusion File Share Server nodes are configured to join the domain.
-
One Share, Just to Get Started:
The cluster is configured with a single share, which is used to demonstrate high availability. You can easily expand the configuration to include more shares, as needed.
-
A Windows Client for Testing:
To validate the setup, and run tests, the guide includes steps for testing cluster using a Windows client. This involves connecting to the share from a Windows machine, performing basic file operations, and verifying that the access to the share continues sealmessly after a node failure.
What is an Active-Active Cluster?
Active-active is a clustering configuration for Fusion File Share Server that provides all three benefits of clustering: continuous availability, scale-out, and high availability. In this mode, multiple servers operate simultaneously, forming a cluster where connections and workloads are distributed across all nodes. This approach leverages the combined networking bandwidth of all servers. It ensures that, in the event of a node failure, only a small number of clients are affected and must wait for their connections to resume on another server.
This configuration is ideal for environments requiring minimal downtime and the ability to scale by adding more servers to handle increased loads, though it is more complex to set up and manage.
Common use cases for active-active clustering include:
-
Business-Critical Applications in Larger Environments:
Active-active clustering is often used in large environments where downtime is costly and a single SMB server cannot handle the workload.
-
Environments with High Bandwidth and Low Latency Demands:
Active-active clustering is commonly deployed where bandwidth demands are high. Examples include:
- Video Production: Multiple users work on different projects simultaneously, requiring substantial bandwidth.
- Low-Latency Media Streaming: Large numbers of clients require high bandwidth and cannot tolerate buffering delays.
Installing Fusion File Share Server
The Fusion File Share Server is distributed as a compressed archive containing essential binary files and configuration templates. Typically, the archive's name resembles tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster.tgz
and it contains the following:
smb/bin/tsmb-server
: The Fusion File Share Server executable.smb/tools/tsmb-privilege
: A utility to manage the privilege database.smb/tools/tsmb-status
: A utility to display the status of the Fusion File Share Server server, and various statistics.smb/tools/tsmb-acls
: A utility to manage ACLs.smb/tools/tsmb-passwd
: A utility to manipulate the user database.smb/tools/tsmb-cfg
: A utility to manage the configuration at runtime.smb/conf/tsmb.conf
: A configuration file template.smb/README.SMB
: A README file with additional information.smb/LICENSES.SMB
: License information for third-party components included in the package.VERSION-INFO
: A file containing the package version.
To install Fusion File Share Server, extract the archive and copy its contents to the relevant system directories:
In this example, the Fusion File Share Server binary is copied to /usr/sbin
, the utilities to /usr/bin
, and the configuration file to /etc
. These directories are typically included in the $PATH
variable on Linux servers. However, you may choose alternative directories based on your specific requirements.
-
Extract the archive:
tar -xzf tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster.tgz
-
Change into the the extracted directory:
cd tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster
-
Copy the Fusion File Share Server binary to
/usr/sbin
, or another location of your choice:sudo cp -af smb/bin/tsmb-server /usr/sbin/tsmb-server
-
Copy the utilities to
/usr/bin
, or another location of your choice:sudo cp -af smb/tools/* /usr/bin/
Prerequisites
Before you begin the installation and configuration of Fusion File Share Server, there are a few prerequisites you should be mindful of.
Active Directory
This guide assumes that you have an Active Directory domain:
- The Fusion File Share Server cluster nodes will be joined to this domain for user authentication.
- It is expected that Active Directory would use Kerberos for authentication.
Server Nodes
This are the machines where you will install Fusion File Share Server.
-
Operating System: Linux.
- [Supported Distributions ?]
- [Minimal Kernel Version ?]
- [Minimal glibc version ?]
- [Kernel parameters?]
- [SELinux/AppArmor?]
- [Can it run in a container? Docker? containerd? lxc/lxd?]
- [Inside Kubernetes?]
noteWhile the instructions in this guide are based on Ubuntu 22.04, you can use any Linux distribution that meets the following requirements, and make the necessary adjustments to the commands and package names.
-
Network:
- Ensure TCP port 445 is open in any firewalls between the server and the client machine.
- Ensure that the server can communicate with the domain controller over the network for LDAP and RFC2307 queries.
- It is recommended (but not strictly required) that the server nodes will be connected to two separate networks: one solely for SMB traffic, and another for everything else (e.g., storage, SSH, etc.). See Network Separation in the Fusion File Share Server Clustering guide for more details.
-
Shared Storage:
- The server nodes should have access to shared storage, such as a SAN or NAS, to store user data, configuration, and continuous availability-related files.
noteActive-active clustering requires that the shared storage is mounted on both nodes at boot time, which requires a clustering or distributed file system, such as GlusterFS or NFS.
For more information on shared storage requirements and options, see the Shared Storage and Persistent State section of the Fusion File Share Server Clustering guide.
-
DNS:
- Ensure that the server nodes' DNS settings point to the domain's DNS server.
- Configure round-robin DNS so for load balancing.
-
Active Directory:
- Ensure that you have the necessary Administrator credentials to join the server nodes to the domain.
-
NTP:
- Ensure that the domain controller and server nodes have properly configured NTP, since Kerberos requires a time skew of no more than 5 minutes between the client and the server.
Client
This is the machine from which you will access the SMB share served by your Fusion File Share Server.
- Operating system: Windows 10, 11, or Windows Server 2016, 2019, 2022 connected to the same network as the server.
- Network:
- Ensure TCP port 445 is open in any firewalls between the client and the server.
- Active Directory:
- The client machine should be joined to the same Active Directory domain as the Fusion File Share Server cluster.
[Optional] Configuring the Secondary Network Interfaces
As mentioned previously, it is recommended, but not strictly required, that the server nodes will be connected to two separate networks: one for the client traffic, and one for the cluster communication.
In this example, we will configure a secondary network interface on each node, that will connect to the cluster communication network:
- Our primary client network is on
ens18
. This interface uses DHCP to obtain an IP address. - Our secondary, SMB-only network, is on
ens19
.- This interface will have a static IP address for both nodes:
- On
fusion-srv1
:10.13.0.100
. - On
fusion-srv2
:10.13.0.101
.
- On
- This interface will also be assigned floating IPs addresses of the cluster:
10.13.0.10
and10.13.0.11
, that will be distributed among the active nodes.
- This interface will have a static IP address for both nodes:
This guide is based on Ubuntu 22.04 and uses netplan
Edit the /etc/netplan/01-netcfg.yaml
file to add the secondary network interface configuration:
network:
version: 2
renderer: networkd
ethernets:
ens18:
dhcp4: yes
ens19:
addresses: [10.13.0.100/16]
Apply the changes:
sudo netplan apply
Edit the /etc/netplan/01-netcfg.yaml
file to add the secondary network interface configuration:
network:
version: 2
renderer: networkd
ethernets:
ens18:
dhcp4: yes
ens19:
addresses: [10.13.0.101/16]
Apply the changes:
sudo netplan apply
Joining the Domain and Adding UPN
Before you begin the cluster setup, you need to join the cluster nodes to the Active Directory domain. This will be achieved by using SSSD and adcli
. (net
from the Samba suite is also supported, but not covered in this guide.)
-
Install the necessary packages:
sudo apt -y update && sudo apt upgrade -y
sudo apt -y install sssd-ad sssd-tools realmd adcli krb5-user sssd-krb5 -
On each server, set the hostname to the FQDN of the server (we will use
fusion-srv1.acme.local
andfusion-srv2.acme.local
as examples):On fusion-srv1:
sudo hostnamectl set-hostname fusion-srv1.acme.local
On fusion-srv2:
sudo hostnamectl set-hostname fusion-srv2.acme.local
-
Configure Kerberos by editing the /etc/krb5.conf file. Note that the
default_realm
should match the domain name in uppercase:[libdefaults]
default_realm = ACME.LOCAL
rdns = false -
Confirm domain discovery by running the following command:
sudo adcli info acme.local
The output should be similar to:
[domain]
domain-name = acme.local
domain-short = ACME
domain-forest = acme.local
domain-controller = DC1.acme.local
domain-controller-site = Default-First-Site-Name
domain-controller-flags = pdc gc ldap ds kdc timeserv closest writable full-secret ads-web
domain-controller-usable = yes
domain-controllers = DC1.acme.local
[computer]
computer-site = Default-First-Site-Name
-
Create SSSD configuration at /etc/sssd/sssd.conf, replacing
acme.local
with your domain name, with thekrb5_realm
parameter in uppercase:[sssd]
domains = acme.local
config_file_version = 2
services = nss
[domain/acme.local]
ad_domain = acme.local
krb5_realm = ACME.LOCAL
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
ldap_id_mapping = True
use_fully_qualified_names = false
access_provider = ad -
Set the permissions on the SSSD configuration file:
sudo chmod 600 /etc/sssd/sssd.conf
-
Create a computer account and a krb5.keytab file for the cluster. We will use
fusion-srv
(without a number) as the cluster's computer account name. The domain and the computer name must be in uppercase:sudo adcli join --domain ACME.LOCAL --service-name=cifs --computer-name FUSION-SRV --host-fqdn fusion-srv.ACME.LOCAL -v
noteThe
--computer-name
and the--host-fqdn
parameters should refer to the name the entire cluster should be known as in the domain. Hence, it's okay that it doesn't match the node's hostname. It is important however that the--host-fqdn
the domain name portion of the FQDN matches the node's hostname's domain name (albeit in uppercase). -
Enable and restart the SSSD service:
sudo systemctl enable sssd
sudo systemctl restart sssd -
Confirm that the computer is part of the domain and is albe to query the domain controller, by querying for information about a user (e.g.,
johndoe
):id johndoe
The output should be similar to:
uid=1414601123(johndoe) gid=1414600513(domain users)
groups=1414600513(domain users),1414601115(testgroup) -
Copy the krb5.keytab file and the SSSD configuration to the other cluster node:
noteFor the following command to work, the public SSH key from the first node must be added to ~root/.ssh/authorized_keys on the second node.
sudo scp /etc/sssd/sssd.conf fusion-srv2:/etc/sssd/sssd.conf
sudo scp /etc/krb5.keytab fusion-srv2:/etc/krb5.keytab
-
Open the Active Directory Users and Computers console, right click on the
FUSION-SRV
computer account, and click on Properties: -
Open the Attribute Editor tab, and find the
userPrincipalName
attribute. Click on Edit:noteIf you don't see the Attribute Editor tab, you may need to enable it by clicking on View > Advanced Features in the Active Directory Users and Computers console.
-
Make sure the
userPrincipalName
attribute is set to the tocifs/[email protected]
(replace the domain name and the computer name with your own), and click OK.
Alternatively, you can use PowerShell:
Set-ADComputer -Identity FUSION-SRV -UserPrincipalName cifs/[email protected]
Enabling Round-Robin DNS
In order for the cluster nodes to be able to serve the SMB shares in an active-active configuration, the DNS server should be configured to return multiple IP addresses for the same fusion-srv
hostname. This is known as round-robin DNS.
-
In the DNS Manager snap-in, right-click on the DNS server and select Properties:
-
Under the Advanced tab, make sure the Enable round robin checkbox is checked, and click OK:
-
Make sure your that in your View menu, Advanced is enabled:
-
Navigate to the Forward Lookup Zones > acme.local zone and aclick on Action > New Host (A or AAAA):
-
In the New Host dialog, enter the
fusion-srv
hostname and the IP address of the first node. Also, set the TTL to 0:0:0:0: -
Then, repeat the action for the second node's IP address, setting the TTL as well:
Configuring Fusion File Share Server
Now that the cluster nodes are configured to operate in the domain environment, we can proceed with the configuration of the Fusion File Share Server cluster.
The following steps assume that the shared storage is already mounted on /mnt/shared
on the node where you are performing the configuration. If the shared storage is mounted on a different path, adjust the paths accordingly.
As mentioned before (and in the Shared Storage and Persistent State section Clustering guide), for the configuration and the persistent state to be available on both nodes, we will store it on the shared storage that is mounted on both nodes. In this guide, we will assume that it's mounted on /mnt/shared
. This directory would also contain the files shared by the SMB server.
The directory structure for the shared storage volume for this example would hence be as follows:
/mnt/shared
├── .tsmb # The configuration and persistent state directory, not visible to SMB clients, containing
│ ├── ca # - The persistent file handle database
│ ├── etc # - The configuration file
│ ├── privilegedb # - The privilege database
│ └── tcp_tickle # - The connection recovery database
└── sh1 # The data directory for the `SH1` share, visible to SMB clients
-
Create the directory structure for the Fusion File Share Server configuration and persistent state, the runtime state directory, and the data directory on the shared storage volume:
sudo mkdir -p /mnt/shared/.tsmb/{ca,etc,privilegedb,tcp_tickle}
sudo mkdir /var/lib/tsmb
sudo mkdir /mnt/shared/sh1 -
Create the configuration file at
/mnt/shared/.tsmb/etc/tsmb.conf
:### GLOBAL CONFIGURATION ###
[global]
# The runtime state directory
runstate_dir = /var/lib/tsmb
# Enable scale-out, as we are configuring an active-active cluster
scale_out = true
# Perform authentication with Active Directory
userdb_type = ad
# The domain name
domain = acme.local
# The computer account name that clients will use to connect to SMB shares
server_name = fusion-srv
# Enable the persistent file handle database
ca = true
# Path to the persistent file handle database
ca_path = /mnt/shared/.tsmb/ca
# Path to the privilege database
privilegedb = /mnt/shared/.tsmb/privilegedb
# Enable connection recovery via TCP tickle and set the path to the database
tcp_tickle = true
tcp_tickle_params = path=/mnt/shared/.tsmb/tcp_tickle
# Enable logs at log level 4, and log to the specified file
log_level = 4
log_destination = file
log_params = path=/var/lib/tsmb/tsmb.log
# Listen on port 445 on the SMB-only network interface
listen = ens19,0.0.0.0,IPv4,445,DIRECT_TCP
[/global]
### SHARE CONFIGURATION ###
# Define the `SH1` share
[share]
# The share's name
netname = SH1
# The comment that will be displayed in the share's properties
remark = Test Share
# The path to the share's data directory
path = /mnt/shared/sh1
# The share's permissions (we're testing, so we're giving full access to everyone)
permissions = everyone:full
[/share]In the annotated example above, we have configured the following:
-
The
scale_out
parameter is set tofalse
:AS we are configuring an active-active cluster, we are disabling scale-out. This parameter is used to enable or disable the scale-out feature, which allows multiple Fusion File Share Server nodes to serve the same shares.
-
The
userdb_type
parameter is set toad
:This indicates that we are using Active Directory for user authentication.
-
The
domain
parameter is set toacme.local
:This indicates the domain name. Replace this with your own domain name.
-
The
server_name
parameter is set tofusion-srv
:This is the computer account name we've chosen when joining the cluster to Active Directory. That is the name clients will use to connect to SMB shares.
-
The
ca_path
,privilegedb
,tcp_tickle
, andtcp_tickle_params
parameters are set to the paths to the persistent file handle database, privilege database, and connection recovery database, respectively. -
The
log_level
,log_destination
, andlog_params
parameters are set to log level 4, log to a file, and the path to the log file, respectively. These can be changed later, but for now, we've set them to log level 4, so that we can make verify the configuration and test some functionality. -
The
listen
parameter is set toens19,0.0.0.0,IPv4,445,DIRECT_TCP
This indicates that the server will listen on port 445 on the
ens19
network interface, which is the SMB-only network interface. -
The share
SH1
is defined with:- The
netname
parameter set toSH1
. - The
remark
parameter set toTest Share
. - The
path
parameter set to/mnt/shared/sh1
. - The
permissions
parameter set toeveryone:full
.
Additional share configuration is possible. For more advanced share configuration parameters, see the SMB Features and Setting and Authorization and Access Management sections where some topics are relevant to specific share behavior, or refer to the share configuration reference for a copmlete list of share configuration parameters.
It is also possible to set the
ca
andca_params
parameters for individual shares, if you want to disable continuous availability for a specific share, or to use a different path for this share's persistent file handle database respectively. - The
-
[Optional] Testing Active Directory Integration
You can now check that the Fusion File Share Server cluster nodes can communicate with the Active Directory domain controller. You can also test how Fusion File Share Server handles user authentication.
-
First, make sure that domain controller is discoverable via DNS.
Check for the SRV record for
_kerberos._tcp
for your domain:nslookup -q=srv _kerberos._tcp.acme.local
The output should be similar to:
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
_kerberos._tcp.acme.local service = 0 100 88 dc1.acme.local.
Authoritative answers can be found from:Then, check for the SRV record for
_kpasswd._tcp
for your domain:nslookup -q=srv _kpasswd._tcp.acme.local
The output should be similar to:
Server: 127.0.0.53
Address: 127.0.0.53#53
Non-authoritative answer:
_kpasswd._tcp.acme.local service = 0 100 464 dc1.acme.local.
Authoritative answers can be found from: -
Check that you are able to obtain tickets for the
cifs
service:sudo kinit -k -V cifs/[email protected] -t /etc/krb5.keytab
The output should be similar to:
keytab specified, forcing -k
Using default cache: /tmp/krb5cc_0
Using principal: cifs/[email protected]
Using keytab: /etc/krb5.keytab
Authenticated to Kerberos v5 -
Check that the key version number (KVNO) for the
cifs
service matches between KDC and the keytab file:sudo klist -k -t /etc/krb5.keytab
The output should be similar to:
Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
2 [email protected]
2 [email protected]
2 [email protected]
2 [email protected]
2 [email protected]
2 [email protected]
2 cifs/[email protected]
2 cifs/[email protected]
2 cifs/[email protected]
2 cifs/[email protected]
2 cifs/[email protected]
2 cifs/[email protected]
. . . -
Check that you are able to obtain tickets for user accounts (e.g.,
johndoe
):sudo kinit -V [email protected]
The output should be similar to:
Using default cache: /tmp/krb5cc_0
Using principal: [email protected]:
Authenticated to Kerberos v5Then, list the tickets:
sudo klist
The output should be similar to:
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: [email protected]
Valid starting Expires Service principal
06/01/2024 14:42:22 06/02/2024 00:42:22 krbtgt/[email protected]
renew until 06/08/2024 14:42:12 -
Now, you can temporarily start the Fusion File Share Server on the node, to validate domain authentication further.
-
First, start Fusion File Share Server on the node:
sudo /usr/sbin/tsmb-server -c /mnt/shared/.tsmb/etc/tsmb.conf
-
Then, from a Windows client, try to access the share using the
johndoe
user account. You should be able to access the share without any issues. -
After testing, stop the Fusion File Share Server by pressing
Ctrl-C
in the terminal. -
Examine the logs at
/var/lib/tsmb/tsmb.log
to see if there there was a successful LDAP connection to the domain controller. The log should contain lines similar to:
Using principal [email protected] for AD client
Resolving SRV RR _ldap._tcp.acme.local
Found URI[0]: ldap://dc1.acme.local:389
Resolving SRV RR _gc._tcp.acme.local
Found URI[0]: ldap://dc1.acme.local:3268
Trying ldap://dc1.acme.local:389
Connected to ldap://dc1.acme.local:389
Our domain SID S-1-5-21-788087510-3421900764-663072633
Our domain NETBIOS-Name 'ACME'- You should also see a ticket inside Fusion File Share Server's ticket cache at
/var/lib/tsmb/tsmb_ccache
. You can examine it by running:
sudo klist -c /var/lib/tsmb/tsmb_ccache
The output should be similar to:
Ticket cache: FILE:/var/lib/tsmb/tsmb_ccache
Default principal: [email protected]
Valid starting Expires Service principal
06/01/2024 14:52:32 06/01/2024 15:52:32 krbtgt/[email protected]
06/01/2024 14:52:32 06/01/2024 15:52:32 ldap/dc1.acme.local@
06/01/2024 14:52:32 06/01/2024 15:52:32 ldap/[email protected] -
Installing and Configuring Pacemaker and Corosync
Now that the Fusion File Share Server is configured, we can proceed with the installation and configuration of the Pacemaker and Corosync cluster software.
-
Install the necessary packages:
sudo apt -y install pacemaker corosync pcs
-
Set password for the
hacluster
user:sudo passwd hacluster
-
Configure hostname resolution on the nodes by editing the
/etc/hosts
file. Add the following lines:10.13.0.100 fusion-srv1
10.13.0.101 fusion-srv2noteAlternatively, you can use DNS for hostname resolution. You can add the following A records to your DNS server:
fusion-srv1 IN A 10.13.0.100
fusion-srv2 IN A 10.13.0.101 -
Start
pcsd
:sudo systemctl start pcsd
-
Destroy the default cluster configuration:
sudo pcs cluster destroy
The output should be similar to:
Shutting down pacemaker/corosync services...
Killing any remaining services...
Removing all cluster configuration files... -
Authenticate the nodes of the cluster:
sudo pcs host auth fusion-srv1 fusion-srv2 -u hacluster
You will be prompted to enter the password for the
hacluster
user that you've set earlier, and should see the following output:Password:
fusion-srv1: Authorized
fusion-srv2: Authorized -
Set up the cluster:
sudo pcs cluster setup fusion_ha fusion-srv1 fusion-srv2 --force
The output should be similar to:
Destroying cluster on nodes: fusion-srv1, fusion-srv2...
fusion-srv1: Stopping Cluster (pacemaker)...
fusion-srv2: Stopping Cluster (pacemaker)...
fusion-srv1: Successfully destroyed cluster
fusion-srv2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'fusion-srv1', 'fusion-srv2'
fusion-srv1: successful distribution of the file 'pacemaker_remote authkey'
fusion-srv2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
fusion-srv1: Succeeded
fusion-srv2: Succeeded
Synchronizing pcsd certificates on nodes fusion-srv1, fusion-srv2...
fusion-srv1: Success
fusion-srv2: Success
Restarting pcsd on the nodes in order to reload the certificates...
fusion-srv1: Success
fusion-srv2: Success -
Start the cluster:
sudo pcs cluster start --all
The output should be similar to:
10.13.0.100: Starting Cluster...
10.13.0.101: Starting Cluster... -
Disable STONITH:
sudo pcs property set stonith-enabled=false
-
Set the cluster to ignore low quorum:
sudo pcs property set no-quorum-policy=ignore
-
Configure the floating IP addresses
10.13.0.10
and10.13.0.11
as a cluster resource and set it to be monitored every second, and set them to prefer the first and second node respectively:sudo pcs resource create cluster_ip_A \
ocf:heartbeat:IPaddr2 \
ip=10.13.0.10 cidr_netmask=32 \
op monitor interval=1s
sudo pcs resource create cluster_ip_B \
ocf:heartbeat:IPaddr2 \
ip=10.13.0.11 cidr_netmask=32 \
op monitor interval=1s
sudo pcs constraint location cluster_ip_A prefers fusion-srv1.acme.local
sudo pcs constraint location cluster_ip_B prefers fusion-srv2.acme.local -
Configure Fusion File Share Server as a cluster resource:
sudo pcs resource create fusion_ha \
ocf:heartbeat:anything \
binfile=/usr/sbin/tsmb-server \
cmdline_options="-c /mnt/shared/.tsmb/etc/tsmb.conf"
sudo pcs resource clone fusion_ha -
Stop the cluster to make sure the configuration is correct:
sudo pcs cluster stop --all
-
Start the cluster:
sudo pcs cluster start --all
At this point, the cluster should be up and running, and the Fusion File Share Server should be serving the SH1
on \FUSION-SRV\SH1.
Testing Failover
Now that the cluster is up and running, we can test the failover process. This would entail:
-
Accessing the share from a Windows client and map it to a drive letter:
This will verify that:
- The cluster software is configured correctly and starts Fusion File Share Server on one of the nodes.
- Fusion File Share Server is configured correctly, is able to authenticate users, and is serving the share.
-
Creating a large file on the windows client and copying it to the share:
This would be used to test the failover process. The file would be large enough to take more than a few seconds to copy. That would allow us to perform a failover while the file is being copied.
-
Performing a failover while the file is copying:
If the file copy continues without interruption, the failover was successful.
-
In a new CMD or a PowerShell window, map the share to a drive letter, authenticating as an existing domain user (e.g.,
johndoe
):PS C:\Users\johndoe> net use Z: \\FUSION-SRV\SH1 /USER:[email protected]
You will be prompted to enter the password for the
johndoe
user.If successful, you should have a new Z: drive mapped to the share.
-
Create a large file on the Windows client. Depending on your network speed, you'd want to create a file that would take more than a few seconds to copy, for example a 10GB file:
PS C:\Users\johndoe> fsutil file createnew C:\Users\johndoe\largefile.txt 10737418240
This will create a 10GB file named
largefile.txt
in theC:\Users\johndoe
directory. -
Copy the file to the share:
PS C:\Users\johndoe> copy C:\Users\johndoe\largefile.txt Z:\
The file copy should start, should take enough time to allow us to perform the failover.
-
First, check the status of the cluster:
sudo pcs status
The output should be similar to:
Cluster name: fusion_ha
Stack: corosync
Current DC: fusion-srv1 (version 2.0.5-9e909a5bdd) - partition with quorum
Last updated: Fri Jun 1 15:00:00 2024
Last change: Fri Jun 1 14:49:53 2024 by root via cibadmin on fusion-srv1.acme.local
2 nodes configured
2 resources configured
Online: [ fusion-srv1.acme.local fusion-srv2.acme.local ]
Full list of resources:
cluster_ip_A (ocf::heartbeat:IPaddr2): Started fusion-srv1.acme.local
cluster_ip_B (ocf::heartbeat:IPaddr2): Started fusion-srv2.acme.local
Clone Set: fusion_ha-clone [fusion_ha]
Started: [ fusion-srv1.acme.local fusion-srv2.acme.local ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled -
Perform the failover:
sudo pcs node standby
-
Check the status of the cluster again, to verify that
cluster_ip_A
has migrated over to the second node:sudo pcs status
The output should be similar to:
Cluster name: fusion_ha
Stack: corosync
Current DC: fusion-srv2 (version 2.0.5-9e909a5bdd) - partition with quorum
Last updated: Fri Jun 1 15:00:00 2024
Last change: Fri Jun 1 14:49:53 2024 by root via cibadmin on fusion-srv1.acme.local
2 nodes configured
2 resources configured
Node fusion-srv1.acme.local: standby
Online: [ fusion-srv2.acme.local ]
Full list of resources:
cluster_ip_A (ocf::heartbeat:IPaddr2): Started fusion-srv1.acme.local
cluster_ip_B (ocf::heartbeat:IPaddr2): Started fusion-srv2.acme.local
Clone Set: fusion_ha-clone [fusion_ha]
Started: [ fusion-srv2.acme.local ]
Stopped: [ fusion-srv1.acme.local ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
At this point, you should wait for the file copy to complete on your Windows client. If the file copy is successful, and the output of the copy
command is 1 file(s) copied.
, the failover was successful, and the Fusion File Share Server cluster is working as expected. 🎉
Conclusion
In this guide, we've configured an active-passive Fusion File Share Server cluster with Pacemaker and Corosync. We've also tested the failover process, and verified that the cluster is working as expected. You can now proceed to configure additional shares, or to further customize the Fusion File Share Server configuration to suit your needs. For example, check out the Performance Tuning Parameters section for a deep dive into optimizing Fusion File Share Server to take full advantage of your hardware and network infrastructure.
For further configuration options, more advanced setups, or any questions you might have, please refer to the official documentation, or contact your sales or support representative.