Configure basic Linux High Availability Cluster in Ubuntu with Corosync

Jellyfish Cluster - photo by robin on flickr
Jellyfish Cluster – photo by robin on flickr

[Read also: HA Cluster with DRBD file sync which adds file sync configuration between cluster nodes]

[UPDATED on March 7, 2017: tested the configuration also with Ubuntu 16.04 LTS]

This post show how to configure a basic High Availability cluster in Ubuntu using Corosync (cluster manager) and Pacemaker (cluster resources manager) software available in Ubuntu repositories (tested on Ubuntu 14.04 and 16.04 LTS). More information regarding Linux HA can be found here.

The goal of this post is to setup a freeradius service in HA. To do this we use two Ubuntu 14.04 or 16.04 LTS Server nodes, announcing a single virtual IP from the active cluster node. Notice that in this scenario each freeradius cluster istance is a standalone istance; I don’t cover application replication/synchronization between the nodes (rsync or shared disk via DRBD). Maybe I can do a new post in the future 🙂 [I did the post]

Convention:

  • PRIMARY – the name of the primary node
  • PRIMARY_IP – the IP address of the primary node
  • SECONDARY – the name of the secondary node
  • SECODARY_IP – the IP address of the secondary node
  • VIP – the IP announced from the master node of the cluster

First of all we install the needed packages

PRIMARY/SECONDARY# apt-get install pacemaker
PRIMARY# apt-get install haveged

and then we can start configuring Corosync, building on the PRIMARY node the key to be shared between the cluster nodes (using havaged package).

PRIMARY# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
[...]
Press keys on your keyboard to generate entropy (bits = 1000).
Writing corosync key to /etc/corosync/authkey.

Now we can remove the havaged package and copy the shared key from PRIMARY to SECONDARY node

PRIMARY# apt-get remove --purge haveged
PRIMARY# apt-get autoremove
PRIMARY# apt-get clean
PRIMARY# scp /etc/corosync/authkey user@SECONDARY:/tmp
SECONDARY# mv /tmp/authkey /etc/corosync
SECONDARY# chown root:root /etc/corosync/authkey
SECONDARY# chmod 400 /etc/corosync/authkey

We are ready now to configure both cluster nodes telling to corosync cluster members, binding IPs and other stuff. To do this edit /etc/corosync/corosync.conf and add a new section (nodelist) on PRIMARY and SECONDAY nodes at the end of the file, as follow.

[Ubuntu 16.04] don’t add the line “name: …” in nodelist section, the corosync version installed in 16.04 don’t support this directive, your cluster will not start. By default the node names are taken from “uname -a” command.

file: /etc/corosync/corosync.conf
[...]
totem {
[...]
interface {
 # The following values need to be set based on your environment 
 ringnumber: 0
 bindnetaddr: <PRIMARY_IP or SECONDARY_IP based on the node>
 mcastaddr: 226.94.1.1
 mcastport: 5405
 }
}
[... end of file ...]

nodelist {
 node {
 ring0_addr: <PRIMARY_IP>
 name: primary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. primary) 
 nodeid: 1 --> node numeric ID (eg. 1)
 }
 node {
 ring0_addr: <SECONDARY_IP>
 name: secondary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. secondary)
 nodeid: 2 --> node numeric ID (eg. 2)
 }
}

Now we configure corosync to use Cluster Resource Manager Pacemaker. To do this create the new file /etc/corosync/service.d/pcmk with following content

[Ubuntu 16.04] First create the /etc/corosync/service.d/ directory with the command # mkdir /etc/corosync/service.d/

file: /etc/corosync/service.d/pcmk
service {
 name: pacemaker
 ver: 1
}

Then enable corosync setting to yes the START parameter

file: /etc/default/corosync
START=yes

Corosync is ready to be started. Follow start and verify commands

PRIMARY/SECONDARY# service corosync start
[...]
PRIMARY/SECONDARY# service corosync status
● corosync.service - Corosync Cluster Engine
 Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
 Active: active (running) since [...]
[...]
PRIMARY/SECONDARY# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(<PRIMARY_IP>) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.740229595.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.740229595.ip (str) = r(0) ip(<SECONDARY_IP>)
runtime.totem.pg.mrp.srp.members.740229595.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.740229595.status (str) = joined

Now it’s time to configure pacemaker, our Cluster Resource Manager.
We enable pacemaker at boot time, setting service priority to 20 (corosync has 19), then we start the service

PRIMARY/SECONDARY# update-rc.d pacemaker defaults 20 01
PRIMARY/SECONDARY# service pacemaker start
[...]
PRIMARY/SECONDARY# service pacemaker status
● pacemaker.service - Pacemaker High Availability Cluster Manager
 Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
 Active: active (running) since [...]
[...]

All the service are (hopefully) in the right state and we can check with crm utility.

[Ubuntu 14.04] the node names will be the one defined in file /etc/corosync/corosync.conf 

[Ubuntu 16.04] the node names will be the taken from “uname -a” command (host names)

PRIMARY/SECONDARY# crm status
Last updated: [...]
Last change: [...] via crm_node on primary
Stack: corosync
Current DC: primary (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
0 Resources configured

Online: [ primary secondary ]

We see both nodes (primary and secondary) online, with the numeric id of current node highlighted.

Now that the cluster infrastructure is ok we do some fine tuning:

  • stonith disable: we avoid automatic cluster node deletion, in a 2 nodes cluster is useless;
  • quorum policy disable: in a 2 nodes cluster we want the cluster up&running also with a single node.
PRIMARY# crm configure property stonith-enabled=false
PRIMARY# crm configure property no-quorum-policy=ignore
PRIMARY/SECONDARY# crm configure show
node $id="1" primary
node $id="2" secondary
property $id="cib-bootstrap-options" \
 dc-version="1.1.10-42f2063" \
 cluster-infrastructure="corosync" \
 stonith-enabled="false" \
 no-quorum-policy="ignore"

We are ready to add resources (Resource Agents) to pacemaker and, as we said before, we will add an IP address (VIP) and the freeradius system service (we need to install it before)

PRIMARY/SECONDARY# apt-get install freeradius

A Resource Agent is “a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.” (have a look here for more information).

We can use two kinds of Resource Agents:

  • LSB: those found on /etc/init.d/ dir and provided by the OS. freeradius will be one of these;
  • OCF: specific resources than can also be downloaded and installed from the web; an extension to LSB resources. VIP will be one of these.

First we configure the VIP, that is an OCF resoure and is called IPaddr2 (binded to eth0 interface)

PRIMARY# crm configure primitive vip1 ocf:heartbeat:IPaddr2 params ip="<VIP>" nic="eth0" op monitor interval="10s"
PRIMARY# crm configure show
node $id="1" primary
node $id="2" secondary
primitive vip1 ocf:heartbeat:IPaddr2 \
 params ip="<VIP>" nic="eth0" \
 op monitor interval="10s" \
 meta target-role="Started"
[...]
PRIMARY# crm status
Last updated: [...]
Last change: [...] via cibadmin on primary
Stack: corosync
Current DC: primary (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
1 Resources configured

Online: [ primary secondary ]

vip1 (ocf::heartbeat:IPaddr2): Started primary
PRIMARY#

The VIP (resurce vip1) is started on primary node and we can check this directly from nodes

PRIMARY# ip addr show 
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 [...]
 inet <PRIMARY_IP> brd <PRIMARY_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 inet <VIP>/32 brd <VIP_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 [...]

SECONDARY# ip addr show
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 [...]
 inet <SECONDARY_IP> brd <SECONDARY_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 [...]

On network side now we are ok, let proceed with freeradius clustering. We add the LSB resource on PRIMARY node

PRIMARY# crm configure primitive freeradius lsb:freeradius \
 op monitor interval="5s" timeout="15s" \
 op start interval="0" timeout="15s" \
 op stop interval="0" timeout="15s" \
 meta target-role="Started"

We have two resources configured (vip1 and freeradius) and the cluster can start each resource on a different node. So we clone the freeradius resource allowing the freeradius service to be active on both nodes at the same time (in this particular case is the right choise and is faster when the cluster switches)

PRIMARY# crm configure clone freeradius-clone freeradius
PRIMARY# crm_mon
Online: [ primary secondary ]

vip1 (ocf::heartbeat:IPaddr2): Started primary
 Clone Set: freeradius-clone [freeradius]
 Started: [ primary secondary ]

Last tuning.
We define resources colocation, saying to the cluster that one resource depends from the location of another resource. This configuration ensure that all the resources involved run on the master cluster node at same time.

PRIMARY# crm configure colocation vip1-freeradius inf: vip1 freeradius-clone

Now we have the cluster up&running, enjoy!

Advertisements

One thought on “Configure basic Linux High Availability Cluster in Ubuntu with Corosync

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s