Friday, August 20, 2010

Moving to corosync and pacemaker

After using Heartbeat 1.X for the last 4 years, it looked like it was time to move to the next generation. Heartbeat it's self is no longer the package it was, thus in looking at the projects out there Corosync with Pacemaker seems like the future.

Using Heartbeat

Most of the machines using Heartbeat were just 2 machine quorum type clusters sharing IP(s) between them, thus this should be a simple move, ya right not. The move to XML and configure options hell and poor examples made this a bit harder then you might think, not that Heartbeat was ever easy.  When Heartbeat would burn you it would burn hard because these were machines that were never to go down, and that is always the problem upgrading machines that were never to go down.

Finding the documentation

The heartbeat web site is mostly not being maintained any more and for the 2.X versions the doc well sucked. Corosync corosync.org site looked very nice and does give you enough to get Corosysnc up but not much else. Pacemaker site http://clusterlabs.org/ is good but there are so many options and the doc goes on and on, so it is useful but as a intro doc NOT. So it's off to Google to go looking through this and that to put something together that will get this going.

Working with and making it work

Heartbeat 1.X was only for 2 machine cluster configurations but Corosync/Pacemaker was for much more, which makes it much more interesting and challenging to work with.   In learning the configurations options of Corosync and Pacemaker I was faced with understanding the flexibility and expandability of the 2 products thus making it a slow process.  To start things you first have to get Corosync up.  Corosync has a standard configuration file /etc/corosync/corosync.conf that has a stanza format which seems popular today note the totem stanza this is important.  
 Corosync.conf:
compatibility: whitetank

totem {
        version: 2
        secauth: on
threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.254.2
                mcastaddr: 226.94.1.1
                mcastport: 5405
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        to_syslog: yes
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}
aisexec {
    user:  root
    group: root
}
service {
   name: pacemaker
   ver: 0
}



The thing to understand here is that Corosync is the networking part of the puzzle , so it's configuration cares about IP's address and secure communications and who is in the cluster at a communications layer only.  It does launch Pacemaker but that is it for the applications side .  There is still a lot of info to take in here and lots of ways to configure Corosync but I'm not going to talk about that all.

Pacemaker config overview
  
Pacemaker does not have a configurations file, it has a command line interface called crm.   Crm has it's own command language that can be used as a one line command or in a shell format.  That said I only want to talk about the new concepts that Pacemaker brings.  Pacemaker has this idea of having many cluster machines in it's cluster thus how do you control what application runs on what machine .  This is done with a priority # system that you can effect with it's control language.

group gr_haproxy IP_haproxy cf_haproxy \
        meta target-role="Started"
location cli-prefer-gr_haproxy gr_haproxy \
        rule $id="cli-prefer-rule-gr_haproxy" inf: #uname eq lb08
location loc_IP_bcom IP_bcom 75: lb08


This cut of control language defines a group , which ties together objects applications to run in a order and on the same machine.  The statement locations is a more direct way influence applications to run on a defined machine in example statements I am changing the priority # to move the applications to the LB08 machine.  There are also order statements to better order how applications should be started and what depends on what.  

A work in progress
   
So far so good as they say, I have converted our old Heartbeat machines to new Corosync/Pacemaker on about 5 clusters, with out real issues and I am  so far impressed with it's workings and speed.  I am still working on understanding how to take advantage of the power of this pair and as always looking for help , any comments or hints out there ?

Some wastland posts

Some wastland posts