Skip to main content

Active-Active Cluster for Scale-Out and High Availability

Introduction

This guide explains how to configure Fusion File Share Server in an active-active cluster. This configuration consists of two Fusion File Share Server installations with two separate server nodes, both active and serving the same SMB shares, allowing for uninterrupted operation if any of the nodes fail.

What You'll Learn in This Guide

By reading and following the instructions in this guide, you'll learn:

  1. The Concept of an Active-Active Fusion File Share Server Cluster and its Benefits:

    We will provide you with the necessary information to assess whether an active-active cluster is the right choice for your use case, as well as the benefits it offers your organization.

  2. How to Configure an Active-Active Fusion File Share Server Cluster:

    Step-by-step instructions will guide you through the process of setting up an active-active Fusion File Share Server cluster. This include setting up the servers, configuring essential settings, and preparing your cluster for evaluation and testing. You'll gain practical knowledge on how to get your cluster up and running quickly and efficiently.

  3. How to Test and Validate Your Active-Active Fusion File Share Server Cluster:

    Make sure your system is highly available and fault tolerant, by testing various failure modes and validating that the service is not interrupted. You will be able to assess the suitability of the installation for your production environment.

  4. Optimizing Your Cluster's Performance:

    Learn how to optimize your Fusion File Share Server cluster for better performance, without sacrificing availability and uptime. We will provide you with tips and best practices to tune your cluster and ensure it is running at its best.

The Use Case Described in this Guide

This guide provides steps for quickly setting up an active-active Fusion File Share Server cluster for continuous availability. It describes a simple use case, however can be easily adapted to your specific requirements.

The described use case is comprised of the following:

  1. Two Server Nodes:

    The cluster consists of two server nodes, running Ubuntu 22.04. The choice of operating system is arbitrary and can be replaced with your preferred distribution, although some commands (e.g., package installation) may differ. The number of nodes can be increased to provide additional redundancy and network capacity.

  2. Active Directory Integration:

    The cluster in this example is integrated with Active Directory for user authentication, and the Fusion File Share Server nodes are configured to join the domain.

  3. One Share, Just to Get Started:

    The cluster is configured with a single share, which is used to demonstrate high availability. You can easily expand the configuration to include more shares, as needed.

  4. A Windows Client for Testing:

    To validate the setup, and run tests, the guide includes steps for testing cluster using a Windows client. This involves connecting to the share from a Windows machine, performing basic file operations, and verifying that the access to the share continues sealmessly after a node failure.

What is an Active-Active Cluster?

Active-active is a clustering configuration for Fusion File Share Server that provides all three benefits of clustering: continuous availability, scale-out, and high availability. In this mode, multiple servers operate simultaneously, forming a cluster where connections and workloads are distributed across all nodes. This approach leverages the combined networking bandwidth of all servers. It ensures that, in the event of a node failure, only a small number of clients are affected and must wait for their connections to resume on another server.

This configuration is ideal for environments requiring minimal downtime and the ability to scale by adding more servers to handle increased loads, though it is more complex to set up and manage.

Common use cases for active-active clustering include:

  • Business-Critical Applications in Larger Environments:

    Active-active clustering is often used in large environments where downtime is costly and a single SMB server cannot handle the workload.

  • Environments with High Bandwidth and Low Latency Demands:

    Active-active clustering is commonly deployed where bandwidth demands are high. Examples include:

    • Video Production: Multiple users work on different projects simultaneously, requiring substantial bandwidth.
    • Low-Latency Media Streaming: Large numbers of clients require high bandwidth and cannot tolerate buffering delays.

Installing Fusion File Share Server

The Fusion File Share Server is distributed as a compressed archive containing essential binary files and configuration templates. Typically, the archive's name resembles tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster.tgz and it contains the following:

To install Fusion File Share Server, extract the archive and copy its contents to the relevant system directories:

note

In this example, the Fusion File Share Server binary is copied to /usr/sbin, the utilities to /usr/bin, and the configuration file to /etc. These directories are typically included in the $PATH variable on Linux servers. However, you may choose alternative directories based on your specific requirements.

Perform the following on all cluster nodes
  1. Extract the archive:

    tar -xzf tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster.tgz
  2. Change into the the extracted directory:

    cd tuxera-smb-3024.3.22.1-r1-x86_64-jammy-user-evaluation-cluster
  3. Copy the Fusion File Share Server binary to /usr/sbin, or another location of your choice:

    sudo cp -af smb/bin/tsmb-server /usr/sbin/tsmb-server
  4. Copy the utilities to /usr/bin, or another location of your choice:

    sudo cp -af smb/tools/* /usr/bin/

Prerequisites

Before you begin the installation and configuration of Fusion File Share Server, there are a few prerequisites you should be mindful of.

Active Directory

This guide assumes that you have an Active Directory domain:

  • The Fusion File Share Server cluster nodes will be joined to this domain for user authentication.
  • It is expected that Active Directory would use Kerberos for authentication.

Server Nodes

This are the machines where you will install Fusion File Share Server.

  • Operating System: Linux.

    • [Supported Distributions ?]
    • [Minimal Kernel Version ?]
    • [Minimal glibc version ?]
    • [Kernel parameters?]
    • [SELinux/AppArmor?]
    • [Can it run in a container? Docker? containerd? lxc/lxd?]
    • [Inside Kubernetes?]
    note

    While the instructions in this guide are based on Ubuntu 22.04, you can use any Linux distribution that meets the following requirements, and make the necessary adjustments to the commands and package names.

  • Network:

    • Ensure TCP port 445 is open in any firewalls between the server and the client machine.
    • Ensure that the server can communicate with the domain controller over the network for LDAP and RFC2307 queries.
    • It is recommended (but not strictly required) that the server nodes will be connected to two separate networks: one solely for SMB traffic, and another for everything else (e.g., storage, SSH, etc.). See Network Separation in the Fusion File Share Server Clustering guide for more details.
  • Shared Storage:

    • The server nodes should have access to shared storage, such as a SAN or NAS, to store user data, configuration, and continuous availability-related files.
    note

    Active-active clustering requires that the shared storage is mounted on both nodes at boot time, which requires a clustering or distributed file system, such as GlusterFS or NFS.

    For more information on shared storage requirements and options, see the Shared Storage and Persistent State section of the Fusion File Share Server Clustering guide.

  • DNS:

    • Ensure that the server nodes' DNS settings point to the domain's DNS server.
    • Configure round-robin DNS so for load balancing.
  • Active Directory:

    • Ensure that you have the necessary Administrator credentials to join the server nodes to the domain.
  • NTP:

    • Ensure that the domain controller and server nodes have properly configured NTP, since Kerberos requires a time skew of no more than 5 minutes between the client and the server.

Client

This is the machine from which you will access the SMB share served by your Fusion File Share Server.

  • Operating system: Windows 10, 11, or Windows Server 2016, 2019, 2022 connected to the same network as the server.
  • Network:
    • Ensure TCP port 445 is open in any firewalls between the client and the server.
  • Active Directory:
    • The client machine should be joined to the same Active Directory domain as the Fusion File Share Server cluster.

[Optional] Configuring the Secondary Network Interfaces

As mentioned previously, it is recommended, but not strictly required, that the server nodes will be connected to two separate networks: one for the client traffic, and one for the cluster communication.

In this example, we will configure a secondary network interface on each node, that will connect to the cluster communication network:

  • Our primary client network is on ens18. This interface uses DHCP to obtain an IP address.
  • Our secondary, SMB-only network, is on ens19.
    • This interface will have a static IP address for both nodes:
      • On fusion-srv1: 10.13.0.100.
      • On fusion-srv2: 10.13.0.101.
    • This interface will also be assigned floating IPs addresses of the cluster: 10.13.0.10 and 10.13.0.11, that will be distributed among the active nodes.
note

This guide is based on Ubuntu 22.04 and uses netplan

Perform the following on the first node (e.g., on fusion-srv1)**

Edit the /etc/netplan/01-netcfg.yaml file to add the secondary network interface configuration:

network:
version: 2
renderer: networkd
ethernets:
ens18:
dhcp4: yes
ens19:
addresses: [10.13.0.100/16]

Apply the changes:

sudo netplan apply
Perform the following on the second node (e.g., on fusion-srv2)

Edit the /etc/netplan/01-netcfg.yaml file to add the secondary network interface configuration:

network:
version: 2
renderer: networkd
ethernets:
ens18:
dhcp4: yes
ens19:
addresses: [10.13.0.101/16]

Apply the changes:

sudo netplan apply

Joining the Domain and Adding UPN

Before you begin the cluster setup, you need to join the cluster nodes to the Active Directory domain. This will be achieved by using SSSD and adcli. (net from the Samba suite is also supported, but not covered in this guide.)

Perform the following on all cluster nodes
  1. Install the necessary packages:

    sudo apt -y update && sudo apt upgrade -y
    sudo apt -y install sssd-ad sssd-tools realmd adcli krb5-user sssd-krb5
  2. On each server, set the hostname to the FQDN of the server (we will use fusion-srv1.acme.local and fusion-srv2.acme.local as examples):

    On fusion-srv1:

    sudo hostnamectl set-hostname fusion-srv1.acme.local

    On fusion-srv2:

    sudo hostnamectl set-hostname fusion-srv2.acme.local
  3. Configure Kerberos by editing the /etc/krb5.conf file. Note that the default_realm should match the domain name in uppercase:

    [libdefaults]
    default_realm = ACME.LOCAL
    rdns = false
  4. Confirm domain discovery by running the following command:

    sudo adcli info acme.local

    The output should be similar to:

    [domain]
    domain-name = acme.local
    domain-short = ACME
    domain-forest = acme.local
    domain-controller = DC1.acme.local
    domain-controller-site = Default-First-Site-Name
    domain-controller-flags = pdc gc ldap ds kdc timeserv closest writable full-secret ads-web
    domain-controller-usable = yes
    domain-controllers = DC1.acme.local
    [computer]
    computer-site = Default-First-Site-Name
Perform the following on only one of the cluster nodes (e.g., on fusion-srv1)
  1. Create SSSD configuration at /etc/sssd/sssd.conf, replacing acme.local with your domain name, with the krb5_realm parameter in uppercase:

    [sssd]
    domains = acme.local
    config_file_version = 2
    services = nss

    [domain/acme.local]
    ad_domain = acme.local
    krb5_realm = ACME.LOCAL
    cache_credentials = True
    id_provider = ad
    krb5_store_password_if_offline = True
    ldap_id_mapping = True
    use_fully_qualified_names = false
    access_provider = ad
  2. Set the permissions on the SSSD configuration file:

    sudo chmod 600 /etc/sssd/sssd.conf
  3. Create a computer account and a krb5.keytab file for the cluster. We will use fusion-srv (without a number) as the cluster's computer account name. The domain and the computer name must be in uppercase:

    sudo adcli join --domain ACME.LOCAL --service-name=cifs --computer-name FUSION-SRV --host-fqdn fusion-srv.ACME.LOCAL -v
    note

    The --computer-name and the --host-fqdn parameters should refer to the name the entire cluster should be known as in the domain. Hence, it's okay that it doesn't match the node's hostname. It is important however that the --host-fqdn the domain name portion of the FQDN matches the node's hostname's domain name (albeit in uppercase).

  4. Enable and restart the SSSD service:

    sudo systemctl enable sssd
    sudo systemctl restart sssd
  5. Confirm that the computer is part of the domain and is albe to query the domain controller, by querying for information about a user (e.g., johndoe):

    id johndoe

    The output should be similar to:

    uid=1414601123(johndoe) gid=1414600513(domain users)
    groups=1414600513(domain users),1414601115(testgroup)
  6. Copy the krb5.keytab file and the SSSD configuration to the other cluster node:

    note

    For the following command to work, the public SSH key from the first node must be added to ~root/.ssh/authorized_keys on the second node.

    sudo scp /etc/sssd/sssd.conf fusion-srv2:/etc/sssd/sssd.conf
    sudo scp /etc/krb5.keytab fusion-srv2:/etc/krb5.keytab
Perform the following thedomain controller
  1. Open the Active Directory Users and Computers console, right click on the FUSION-SRV  computer account, and click on Properties:

    Active Directory Users and Computers

  2. Open the Attribute Editor tab, and find the userPrincipalName attribute. Click on Edit:

    Attribute Editor

    note

    If you don't see the Attribute Editor tab, you may need to enable it by clicking on View > Advanced Features in the Active Directory Users and Computers console.

  3. Make sure the userPrincipalName attribute is set to the to cifs/[email protected] (replace the domain name and the computer name with your own), and click OK.

note

Alternatively, you can use PowerShell:

Set-ADComputer -Identity FUSION-SRV -UserPrincipalName cifs/[email protected]

Enabling Round-Robin DNS

In order for the cluster nodes to be able to serve the SMB shares in an active-active configuration, the DNS server should be configured to return multiple IP addresses for the same fusion-srv hostname. This is known as round-robin DNS.

Perform the following on the DNS server
  1. In the DNS Manager snap-in, right-click on the DNS server and select Properties:

    DNS Manager

  2. Under the Advanced tab, make sure the Enable round robin checkbox is checked, and click OK:

    DNS Properties

  3. Make sure your that in your View menu, Advanced is enabled:

    DNS Round Robin

  4. Navigate to the Forward Lookup Zones > acme.local zone and aclick on Action > New Host (A or AAAA):

    DNS Round Robin

  5. In the New Host dialog, enter the fusion-srv hostname and the IP address of the first node. Also, set the TTL to 0:0:0:0:

    DNS Round Robin

  6. Then, repeat the action for the second node's IP address, setting the TTL as well:

    DNS Round Robin

Configuring Fusion File Share Server

Now that the cluster nodes are configured to operate in the domain environment, we can proceed with the configuration of the Fusion File Share Server cluster.

note

The following steps assume that the shared storage is already mounted on /mnt/shared on the node where you are performing the configuration. If the shared storage is mounted on a different path, adjust the paths accordingly.

As mentioned before (and in the Shared Storage and Persistent State section Clustering guide), for the configuration and the persistent state to be available on both nodes, we will store it on the shared storage that is mounted on both nodes. In this guide, we will assume that it's mounted on /mnt/shared. This directory would also contain the files shared by the SMB server. The directory structure for the shared storage volume for this example would hence be as follows:

/mnt/shared
├── .tsmb # The configuration and persistent state directory, not visible to SMB clients, containing
│ ├── ca # - The persistent file handle database
│ ├── etc # - The configuration file
│ ├── privilegedb # - The privilege database
│ └── tcp_tickle # - The connection recovery database
└── sh1 # The data directory for the `SH1` share, visible to SMB clients
Perform the following on only one of the cluster nodes (e.g., on fusion-srv1)
  1. Create the directory structure for the Fusion File Share Server configuration and persistent state, the runtime state directory, and the data directory on the shared storage volume:

    sudo mkdir -p /mnt/shared/.tsmb/{ca,etc,privilegedb,tcp_tickle}
    sudo mkdir /var/lib/tsmb
    sudo mkdir /mnt/shared/sh1
  2. Create the configuration file at /mnt/shared/.tsmb/etc/tsmb.conf:

    ### GLOBAL CONFIGURATION ###

    [global]

    # The runtime state directory
    runstate_dir = /var/lib/tsmb

    # Enable scale-out, as we are configuring an active-active cluster
    scale_out = true

    # Perform authentication with Active Directory
    userdb_type = ad

    # The domain name
    domain = acme.local

    # The computer account name that clients will use to connect to SMB shares
    server_name = fusion-srv

    # Enable the persistent file handle database
    ca = true

    # Path to the persistent file handle database
    ca_path = /mnt/shared/.tsmb/ca

    # Path to the privilege database
    privilegedb = /mnt/shared/.tsmb/privilegedb

    # Enable connection recovery via TCP tickle and set the path to the database
    tcp_tickle = true
    tcp_tickle_params = path=/mnt/shared/.tsmb/tcp_tickle

    # Enable logs at log level 4, and log to the specified file
    log_level = 4
    log_destination = file
    log_params = path=/var/lib/tsmb/tsmb.log

    # Listen on port 445 on the SMB-only network interface
    listen = ens19,0.0.0.0,IPv4,445,DIRECT_TCP

    [/global]

    ### SHARE CONFIGURATION ###

    # Define the `SH1` share
    [share]

    # The share's name
    netname = SH1

    # The comment that will be displayed in the share's properties
    remark = Test Share

    # The path to the share's data directory
    path = /mnt/shared/sh1

    # The share's permissions (we're testing, so we're giving full access to everyone)
    permissions = everyone:full

    [/share]

    In the annotated example above, we have configured the following:

    • The scale_out parameter is set to false:

      AS we are configuring an active-active cluster, we are disabling scale-out. This parameter is used to enable or disable the scale-out feature, which allows multiple Fusion File Share Server nodes to serve the same shares.

    • The userdb_type parameter is set to ad:

      This indicates that we are using Active Directory for user authentication.

    • The domain parameter is set to acme.local:

      This indicates the domain name. Replace this with your own domain name.

    • The server_name parameter is set to fusion-srv:

      This is the computer account name we've chosen when joining the cluster to Active Directory. That is the name clients will use to connect to SMB shares.

    • The ca_path, privilegedb, tcp_tickle, and tcp_tickle_params parameters are set to the paths to the persistent file handle database, privilege database, and connection recovery database, respectively.

    • The log_level, log_destination, and log_params parameters are set to log level 4, log to a file, and the path to the log file, respectively. These can be changed later, but for now, we've set them to log level 4, so that we can make verify the configuration and test some functionality.

    • The listen parameter is set to ens19,0.0.0.0,IPv4,445,DIRECT_TCP

      This indicates that the server will listen on port 445 on the ens19 network interface, which is the SMB-only network interface.

    • The share SH1 is defined with:

      • The netname parameter set to SH1.
      • The remark parameter set to Test Share.
      • The path parameter set to /mnt/shared/sh1.
      • The permissions parameter set to everyone:full.

      Additional share configuration is possible. For more advanced share configuration parameters, see the SMB Features and Setting and Authorization and Access Management sections where some topics are relevant to specific share behavior, or refer to the share configuration reference for a copmlete list of share configuration parameters.

      It is also possible to set the ca and ca_params parameters for individual shares, if you want to disable continuous availability for a specific share, or to use a different path for this share's persistent file handle database respectively.

[Optional] Testing Active Directory Integration

You can now check that the Fusion File Share Server cluster nodes can communicate with the Active Directory domain controller. You can also test how Fusion File Share Server handles user authentication.

Perform the following on both cluster nodes
  1. First, make sure that domain controller is discoverable via DNS.

    Check for the SRV record for _kerberos._tcp for your domain:

    nslookup -q=srv _kerberos._tcp.acme.local

    The output should be similar to:

    Server:         127.0.0.53
    Address: 127.0.0.53#53

    Non-authoritative answer:
    _kerberos._tcp.acme.local service = 0 100 88 dc1.acme.local.

    Authoritative answers can be found from:

    Then, check for the SRV record for _kpasswd._tcp for your domain:

    nslookup -q=srv _kpasswd._tcp.acme.local

    The output should be similar to:

    Server:         127.0.0.53
    Address: 127.0.0.53#53

    Non-authoritative answer:
    _kpasswd._tcp.acme.local service = 0 100 464 dc1.acme.local.

    Authoritative answers can be found from:
  2. Check that you are able to obtain tickets for the cifs service:

    sudo kinit -k -V cifs/[email protected] -t /etc/krb5.keytab

    The output should be similar to:

    keytab specified, forcing -k
    Using default cache: /tmp/krb5cc_0
    Using principal: cifs/[email protected]
    Using keytab: /etc/krb5.keytab
    Authenticated to Kerberos v5
  3. Check that the key version number (KVNO) for the cifs service matches between KDC and the keytab file:

    sudo klist -k -t /etc/krb5.keytab

    The output should be similar to:

    Keytab name: FILE:/etc/krb5.keytab
    KVNO Principal
    ---- --------------------------------------------------------------------------
    2 [email protected]
    2 [email protected]
    2 [email protected]
    2 [email protected]
    2 [email protected]
    2 [email protected]
    2 cifs/[email protected]
    2 cifs/[email protected]
    2 cifs/[email protected]
    2 cifs/[email protected]
    2 cifs/[email protected]
    2 cifs/[email protected]

    . . .
  4. Check that you are able to obtain tickets for user accounts (e.g., johndoe):

    sudo kinit -V [email protected]

    The output should be similar to:

    Using default cache: /tmp/krb5cc_0
    Using principal: [email protected]:
    Authenticated to Kerberos v5

    Then, list the tickets:

    sudo klist

    The output should be similar to:

    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: [email protected]

    Valid starting Expires Service principal
    06/01/2024 14:42:22 06/02/2024 00:42:22 krbtgt/[email protected]
    renew until 06/08/2024 14:42:12
  5. Now, you can temporarily start the Fusion File Share Server on the node, to validate domain authentication further.

    • First, start Fusion File Share Server on the node:

      sudo /usr/sbin/tsmb-server -c /mnt/shared/.tsmb/etc/tsmb.conf
    • Then, from a Windows client, try to access the share using the johndoe user account. You should be able to access the share without any issues.

    • After testing, stop the Fusion File Share Server by pressing Ctrl-C in the terminal.

    • Examine the logs at /var/lib/tsmb/tsmb.log to see if there there was a successful LDAP connection to the domain controller. The log should contain lines similar to:

    Using principal [email protected] for AD client
    Resolving SRV RR _ldap._tcp.acme.local
    Found URI[0]: ldap://dc1.acme.local:389
    Resolving SRV RR _gc._tcp.acme.local
    Found URI[0]: ldap://dc1.acme.local:3268
    Trying ldap://dc1.acme.local:389
    Connected to ldap://dc1.acme.local:389
    Our domain SID S-1-5-21-788087510-3421900764-663072633
    Our domain NETBIOS-Name 'ACME'
    • You should also see a ticket inside Fusion File Share Server's ticket cache at /var/lib/tsmb/tsmb_ccache. You can examine it by running:
    sudo klist -c /var/lib/tsmb/tsmb_ccache

    The output should be similar to:

    Ticket cache: FILE:/var/lib/tsmb/tsmb_ccache
    Default principal: [email protected]

    Valid starting Expires Service principal
    06/01/2024 14:52:32 06/01/2024 15:52:32 krbtgt/[email protected]
    06/01/2024 14:52:32 06/01/2024 15:52:32 ldap/dc1.acme.local@
    06/01/2024 14:52:32 06/01/2024 15:52:32 ldap/[email protected]

Installing and Configuring Pacemaker and Corosync

Now that the Fusion File Share Server is configured, we can proceed with the installation and configuration of the Pacemaker and Corosync cluster software.

Perform the following on all cluster nodes (e.g., on fusion-srv1 and fusion-srv2)
  1. Install the necessary packages:

    sudo apt -y install pacemaker corosync pcs
  2. Set password for the hacluster user:

    sudo passwd hacluster
  3. Configure hostname resolution on the nodes by editing the /etc/hosts file. Add the following lines:

    10.13.0.100 fusion-srv1
    10.13.0.101 fusion-srv2
    note

    Alternatively, you can use DNS for hostname resolution. You can add the following A records to your DNS server:

    fusion-srv1    IN A 10.13.0.100
    fusion-srv2 IN A 10.13.0.101
  4. Start pcsd:

    sudo systemctl start pcsd
Perform the following on only one of the cluster nodes (e.g., on fusion-srv1)
  1. Destroy the default cluster configuration:

    sudo pcs cluster destroy

    The output should be similar to:

    Shutting down pacemaker/corosync services...
    Killing any remaining services...
    Removing all cluster configuration files...
  2. Authenticate the nodes of the cluster:

    sudo pcs host auth fusion-srv1 fusion-srv2 -u hacluster

    You will be prompted to enter the password for the hacluster user that you've set earlier, and should see the following output:

    Password:
    fusion-srv1: Authorized
    fusion-srv2: Authorized
  3. Set up the cluster:

    sudo pcs cluster setup fusion_ha fusion-srv1 fusion-srv2 --force

    The output should be similar to:

    Destroying cluster on nodes: fusion-srv1, fusion-srv2...
    fusion-srv1: Stopping Cluster (pacemaker)...
    fusion-srv2: Stopping Cluster (pacemaker)...
    fusion-srv1: Successfully destroyed cluster
    fusion-srv2: Successfully destroyed cluster

    Sending 'pacemaker_remote authkey' to 'fusion-srv1', 'fusion-srv2'
    fusion-srv1: successful distribution of the file 'pacemaker_remote authkey'
    fusion-srv2: successful distribution of the file 'pacemaker_remote authkey'
    Sending cluster config files to the nodes...
    fusion-srv1: Succeeded
    fusion-srv2: Succeeded

    Synchronizing pcsd certificates on nodes fusion-srv1, fusion-srv2...
    fusion-srv1: Success
    fusion-srv2: Success
    Restarting pcsd on the nodes in order to reload the certificates...
    fusion-srv1: Success
    fusion-srv2: Success
  4. Start the cluster:

    sudo pcs cluster start --all

    The output should be similar to:

    10.13.0.100: Starting Cluster...
    10.13.0.101: Starting Cluster...
  5. Disable STONITH:

    sudo pcs property set stonith-enabled=false
  6. Set the cluster to ignore low quorum:

    sudo pcs property set no-quorum-policy=ignore
  7. Configure the floating IP addresses 10.13.0.10 and 10.13.0.11 as a cluster resource and set it to be monitored every second, and set them to prefer the first and second node respectively:

    sudo pcs resource create cluster_ip_A \
    ocf:heartbeat:IPaddr2 \
    ip=10.13.0.10 cidr_netmask=32 \
    op monitor interval=1s

    sudo pcs resource create cluster_ip_B \
    ocf:heartbeat:IPaddr2 \
    ip=10.13.0.11 cidr_netmask=32 \
    op monitor interval=1s

    sudo pcs constraint location cluster_ip_A prefers fusion-srv1.acme.local

    sudo pcs constraint location cluster_ip_B prefers fusion-srv2.acme.local
  8. Configure Fusion File Share Server as a cluster resource:

    sudo pcs resource create fusion_ha \
    ocf:heartbeat:anything \
    binfile=/usr/sbin/tsmb-server \
    cmdline_options="-c /mnt/shared/.tsmb/etc/tsmb.conf"

    sudo pcs resource clone fusion_ha
  9. Stop the cluster to make sure the configuration is correct:

    sudo pcs cluster stop --all
  10. Start the cluster:

    sudo pcs cluster start --all

At this point, the cluster should be up and running, and the Fusion File Share Server should be serving the SH1 on \FUSION-SRV\SH1.

Testing Failover

Now that the cluster is up and running, we can test the failover process. This would entail:

  1. Accessing the share from a Windows client and map it to a drive letter:

    This will verify that:

    • The cluster software is configured correctly and starts Fusion File Share Server on one of the nodes.
    • Fusion File Share Server is configured correctly, is able to authenticate users, and is serving the share.
  2. Creating a large file on the windows client and copying it to the share:

    This would be used to test the failover process. The file would be large enough to take more than a few seconds to copy. That would allow us to perform a failover while the file is being copied.

  3. Performing a failover while the file is copying:

    If the file copy continues without interruption, the failover was successful.

Perform the following on the Windows client
  1. In a new CMD or a PowerShell window, map the share to a drive letter, authenticating as an existing domain user (e.g., johndoe):

    PS C:\Users\johndoe> net use Z: \\FUSION-SRV\SH1 /USER:[email protected]

    You will be prompted to enter the password for the johndoe user.

    If successful, you should have a new Z: drive mapped to the share.

  2. Create a large file on the Windows client. Depending on your network speed, you'd want to create a file that would take more than a few seconds to copy, for example a 10GB file:

    PS C:\Users\johndoe> fsutil file createnew C:\Users\johndoe\largefile.txt 10737418240

    This will create a 10GB file named largefile.txt in the C:\Users\johndoe directory.

  3. Copy the file to the share:

    PS C:\Users\johndoe> copy C:\Users\johndoe\largefile.txt Z:\

    The file copy should start, should take enough time to allow us to perform the failover.

Perform the following on currently active cluster node (e.g., on *fusion-srv1)
  1. First, check the status of the cluster:

    sudo pcs status

    The output should be similar to:

    Cluster name: fusion_ha
    Stack: corosync
    Current DC: fusion-srv1 (version 2.0.5-9e909a5bdd) - partition with quorum
    Last updated: Fri Jun 1 15:00:00 2024
    Last change: Fri Jun 1 14:49:53 2024 by root via cibadmin on fusion-srv1.acme.local

    2 nodes configured
    2 resources configured

    Online: [ fusion-srv1.acme.local fusion-srv2.acme.local ]

    Full list of resources:

    cluster_ip_A (ocf::heartbeat:IPaddr2): Started fusion-srv1.acme.local
    cluster_ip_B (ocf::heartbeat:IPaddr2): Started fusion-srv2.acme.local
    Clone Set: fusion_ha-clone [fusion_ha]
    Started: [ fusion-srv1.acme.local fusion-srv2.acme.local ]

    Daemon Status:
    corosync: active/disabled
    pacemaker: active/disabled
    pcsd: active/enabled
  2. Perform the failover:

    sudo pcs node standby
  3. Check the status of the cluster again, to verify that cluster_ip_A has migrated over to the second node:

    sudo pcs status

    The output should be similar to:

    Cluster name: fusion_ha
    Stack: corosync
    Current DC: fusion-srv2 (version 2.0.5-9e909a5bdd) - partition with quorum
    Last updated: Fri Jun 1 15:00:00 2024
    Last change: Fri Jun 1 14:49:53 2024 by root via cibadmin on fusion-srv1.acme.local

    2 nodes configured
    2 resources configured

    Node fusion-srv1.acme.local: standby
    Online: [ fusion-srv2.acme.local ]

    Full list of resources:

    cluster_ip_A (ocf::heartbeat:IPaddr2): Started fusion-srv1.acme.local
    cluster_ip_B (ocf::heartbeat:IPaddr2): Started fusion-srv2.acme.local
    Clone Set: fusion_ha-clone [fusion_ha]
    Started: [ fusion-srv2.acme.local ]
    Stopped: [ fusion-srv1.acme.local ]

    Daemon Status:
    corosync: active/disabled
    pacemaker: active/disabled
    pcsd: active/enabled

At this point, you should wait for the file copy to complete on your Windows client. If the file copy is successful, and the output of the copy command is 1 file(s) copied., the failover was successful, and the Fusion File Share Server cluster is working as expected. 🎉

Conclusion

In this guide, we've configured an active-passive Fusion File Share Server cluster with Pacemaker and Corosync. We've also tested the failover process, and verified that the cluster is working as expected. You can now proceed to configure additional shares, or to further customize the Fusion File Share Server configuration to suit your needs. For example, check out the Performance Tuning Parameters section for a deep dive into optimizing Fusion File Share Server to take full advantage of your hardware and network infrastructure.

For further configuration options, more advanced setups, or any questions you might have, please refer to the official documentation, or contact your sales or support representative.