Quantcast
Channel: redhat cluster 7 – UnixArena
Viewing all 21 articles
Browse latest View live

RHEL 7 – Redhat Cluster with Pacemaker – Overview – Part 2

$
0
0

Pacemaker is robust and powerful opensource resource manager  which is shipping with  Redhat Enterprise Linux 7 as High Availability Add-on . Pacemaker simplified the cluster configuration and cluster management on RHEL 7 which is really good for system administrators. Compare to the prior redhat cluster release, Redhat cluster 7 looks completely different with corosync cluster engine and pacemaker resource manager. In this article , we will see the Redhat cluster core components and it’s responsibility.

 

Redhat Cluster Core Components:

 

1.Resource Agents

Resource agents are nothing but a scripts that start, stop and monitor them.

 

2.Resource Manager 

Pacemaker provides the brain that processes and reacts to events regarding the cluster. These events include nodes joining or leaving the cluster.  Resource events caused by failures, maintenance and scheduled activities and other administrative actions. Pacemaker will compute the ideal state of the cluster and plot a path to achieve it after any of these events. This may include moving resources, stopping nodes and even forcing them offline with remote power switches.

 

3. Low-level infrastructure:

Corosync provide reliable messaging, membership and quorum information about the cluster.

Redhat Cluster with Pacemaker
Redhat Cluster with Pacemaker

 

 

Pacemaker:

 

Pacemaker is responsible to provide maximum availability for your cluster services/resources by detecting and recovering from node and resource-level failures. It uses messaging and membership capabilities provided by  Corosync to keep the resource available on any of the cluster nodes.

  • Detection and recovery of node and service-level failures
  • Storage agnostic, no requirement for shared storage
  • Resource agnostic, anything that can be scripted can be clustered
  • Supports fencing (STONITH ) for ensuring data integrity
  • Supports large (32 node) and small clusters  (2  node)
  • Supports both quorate and resource-driven clusters
  • Supports practically any redundancy configuration
  • Automatically replicated configuration that can be updated from any node
  • Ability to specify cluster-wide service ordering, colocation and anti-colocation
  • Support for advanced service types
    • Clones: for services which need to be active on multiple nodes
    • Multi-state: for services with multiple modes (e.g. master/slave, primary/secondary)
  • Unified, scriptable cluster management tools

Notes from http://clusterlabs.org/.

 

Pacemaker’s key components:

 

  • Cluster Information Base (CIB)

It uses XML format file (cib.xml) to represent the cluster configuration and current state of cluster to all the                  nodes. This file be kept in sync across all the nodes and used by PEngine to compute ideal state of the cluster                 and how it should be achieved.

 

  • Cluster Resource Management daemon (CRMd)

List of instruction will feed to the designated controller (DC).Pacemaker centralizes all cluster  decision                          making by electing one of the CRMd instances to act as a master. If one CRMd instance fails, automatically                     new one will establish.

 

  • Local Resource Management daemon (LRMd)

LRMd is responsible to hear the instruction from PEngine.

 

  • Policy Engine (PEngine or PE)

PEngine uses the CIB XML file to determine the cluster state and recalculate the ideal cluster state based on                  the unexpected results.

 

  • Fencing daemon (STONITHd)

If any node misbehaves , it better to turned off instead of corrupting the data on shared storage. Shoot-The-                 Other-Node-In-The-Head (STONITHd) offers fencing mechanism in RHEL 7.

 

Pacemaker Internals

 

Corosync:

Corosync is an opensource  cluster engine which communicates with multiple cluster nodes and updates the cluster information database (cib.xml) frequently . In previous redhat cluster release, “cman” was responsible for cluster interconnect, messaging and membership capabilities. Pacemaker also supports “heartbeat” which is another opensource cluster engine (Not available in RHEL 7).

 

Types of Redhat Cluster supported with Pacemaker:

 

  1. Active/Passive cluster for DR setup:

In the following cluster model, we are using pacemaker and DRBD (Remote Replication) for DR solutions. If production site goes down, Redhat cluster will automatically activates the DR site.

Active/Passive  Cluster
Active/Passive Cluster

 

2. Active/Passive  cluster for Backup solution:

The following digram shows the Active/Passive shared cluster with common Backup node.

Active Passive shared cluster with common Backup node
Active/Passive shared cluster with common Backup node

 

3. Active/Active Cluster:

If we have a shared storage, every node can potentially be used for failover. Pacemaker can even run multiple copies of services to spread out the workload across multiple nodes.

Active/Active cluster
Active/Active cluster

 

Hope this article is informative to you. Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Redhat Cluster with Pacemaker – Overview – Part 2 appeared first on UnixArena.


RHEL 7 – Installing Redhat Cluster Software (Corosync/pacemaker) – Part 3

$
0
0

In this article, we will see that how to install Redhat cluster software (Pacemaker) on RHEL 7. If you have valid redhat subscription , you can directly configure redhat repository and install the packages. It also available in the RHEL 7 ISO image as an Add-on Package. Unlike previous redhat cluster releases , Redhat cluster 7 installation looks very simple since redhat has moved to pacemaker & corosync. Prior to proceeding with installation, I would request to go through the following articles.

 

 

Environment:

  • Operating System: Redhat Enterprise Linux 7.2
  • Repository : Local YUM Repository using RHEL 7.2 DVD ISO image.
  • Type of Cluster : Active / Passive – Two Node cluster
  • Cluster Resource : KVM guest (VirtualDomain)

 

 

YUM Repository configuration for OS , HA & Storage:

 

1. Copy the RHEL 7.2 DVD ISO image to the system or attach as DVD device.

2. Mount the ISO Image /DVD under “/repo”

[root@UA-HA ~]# df -h /repo
Filesystem      Size  Used Avail Use% Mounted on
/dev/sr1        3.8G  3.8G     0 100% /repo
[root@UA-HA ~]#

 

3.List the DVD contents.

[root@UA-HA ~]# ls -lrt /repo
total 872
-r--r--r--  1 root root  18092 Mar  6  2012 GPL
-r--r--r--  1 root root   8266 Apr  4  2014 EULA
-r--r--r--  1 root root   3211 Oct 23 09:25 RPM-GPG-KEY-redhat-release
-r--r--r--  1 root root   3375 Oct 23 09:25 RPM-GPG-KEY-redhat-beta
-r--r--r--  1 root root    114 Oct 30 10:54 media.repo
-r--r--r--  1 root root   1568 Oct 30 11:03 TRANS.TBL
dr-xr-xr-x  2 root root   4096 Oct 30 11:03 repodata
dr-xr-xr-x 24 root root   6144 Oct 30 11:03 release-notes
dr-xr-xr-x  2 root root 835584 Oct 30 11:03 Packages
dr-xr-xr-x  2 root root   2048 Oct 30 11:03 LiveOS
dr-xr-xr-x  2 root root   2048 Oct 30 11:03 isolinux
dr-xr-xr-x  3 root root   2048 Oct 30 11:03 images
dr-xr-xr-x  3 root root   2048 Oct 30 11:03 EFI
dr-xr-xr-x  4 root root   2048 Oct 30 11:03 addons
[root@UA-HA ~]#

4. Create the yum repository file with name of “ua.repo” and update with following contents. (Except “cat” command line)

[root@UA-HA ~]# cat /etc/yum.repos.d/ua.repo
[repo-update]
gpgcheck=0
enabled=1
baseurl=file:///repo
name=repo-update

[repo-ha]
gpgcheck=0
enabled=1
baseurl=file:///repo/addons/HighAvailability
name=repo-ha

[repo-storage]
gpgcheck=0
enabled=1
baseurl=file:///repo/addons/ResilientStorage
name=repo-storage
[root@UA-HA ~]#

 

5.List the configured yum repositories.

[root@UA-HA ~]# yum repolist
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
repo id                                                                         repo name                                                                      status
!repo-ha                                                                        repo-ha                                                                           30
!repo-storage                                                                   repo-storage                                                                      37
!repo-update                                                                    repo-update                                                                    4,620
repolist: 4,687
[root@UA-HA ~]#

We have successfully configured the YUM local repository using RHEL 7.2 ISO image.

 

 

Installing Cluster Packages on Nodes:

 

1.Login to the RHEL 7.2 node as root user.

 

2. Execute the following command to install the cluster packages and it’s dependencies. Corosync will install along with pacemaker.

[root@UA-HA ~]# yum install -y pacemaker pcs psmisc policycoreutils-python
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package pacemaker.x86_64 0:1.1.13-10.el7 will be installed
--> Processing Dependency: pacemaker-cli = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: pacemaker-cluster-libs = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: pacemaker-libs = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: corosync for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcfg.so.6(COROSYNC_CFG_0.82)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcmap.so.4(COROSYNC_CMAP_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcpg.so.4(COROSYNC_CPG_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libquorum.so.5(COROSYNC_QUORUM_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: resource-agents for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcfg.so.6()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcib.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcmap.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcorosync_common.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcpg.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmcluster.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmcommon.so.3()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmservice.so.3()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: liblrmd.so.1()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpe_rules.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpe_status.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpengine.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libquorum.so.5()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libstonithd.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libtransitioner.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
---> Package pcs.x86_64 0:0.9.143-15.el7 will be installed
---> Package policycoreutils-python.x86_64 0:2.2.5-20.el7 will be installed
---> Package psmisc.x86_64 0:22.20-9.el7 will be installed
--> Running transaction check
---> Package corosync.x86_64 0:2.3.4-7.el7 will be installed
---> Package corosynclib.x86_64 0:2.3.4-7.el7 will be installed
---> Package pacemaker-cli.x86_64 0:1.1.13-10.el7 will be installed
---> Package pacemaker-cluster-libs.x86_64 0:1.1.13-10.el7 will be installed
---> Package pacemaker-libs.x86_64 0:1.1.13-10.el7 will be installed
---> Package resource-agents.x86_64 0:3.9.5-54.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================
 Package                       Arch          Version                Repository          Size
=============================================================================================
Installing:
 pacemaker                     x86_64        1.1.13-10.el7          repo-ha            462 k
 pcs                           x86_64        0.9.143-15.el7         repo-ha            4.7 M
 policycoreutils-python        x86_64        2.2.5-20.el7           repo-update        435 k
 psmisc                        x86_64        22.20-9.el7            repo-update        140 k
Installing for dependencies:
 corosync                      x86_64        2.3.4-7.el7            repo-ha            210 k
 corosynclib                   x86_64        2.3.4-7.el7            repo-ha            124 k
 pacemaker-cli                 x86_64        1.1.13-10.el7          repo-ha            253 k
 pacemaker-cluster-libs        x86_64        1.1.13-10.el7          repo-ha             92 k
 pacemaker-libs                x86_64        1.1.13-10.el7          repo-ha            519 k
 resource-agents               x86_64        3.9.5-54.el7           repo-ha            339 k

Transaction Summary
============================================================================================
Install  4 Packages (+6 Dependent packages)

Total download size: 7.3 M
Installed size: 19 M
Downloading packages:
--------------------------------------------------------------------------------------------
Total                         19 MB/s | 7.3 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : corosynclib-2.3.4-7.el7.x86_64                  1/10
  Installing : corosync-2.3.4-7.el7.x86_64                     2/10
  Installing : pacemaker-libs-1.1.13-10.el7.x86_64             3/10
  Installing : pacemaker-cli-1.1.13-10.el7.x86_64              4/10
  Installing : psmisc-22.20-9.el7.x86_64                       5/10
  Installing : resource-agents-3.9.5-54.el7.x86_64             6/10
  Installing : pacemaker-cluster-libs-1.1.13-10.el7.x86_64     7/10
  Installing : pacemaker-1.1.13-10.el7.x86_64                  8/10
  Installing : pcs-0.9.143-15.el7.x86_64                       9/10
  Installing : policycoreutils-python-2.2.5-20.el7.x86_64     10/10
  Verifying  : pcs-0.9.143-15.el7.x86_64                       1/10
  Verifying  : corosync-2.3.4-7.el7.x86_64                     2/10
  Verifying  : pacemaker-cli-1.1.13-10.el7.x86_64              3/10
  Verifying  : psmisc-22.20-9.el7.x86_64                       4/10
  Verifying  : resource-agents-3.9.5-54.el7.x86_64             5/10
  Verifying  : pacemaker-cluster-libs-1.1.13-10.el7.x86_64     6/10
  Verifying  : pacemaker-libs-1.1.13-10.el7.x86_64             7/10
  Verifying  : pacemaker-1.1.13-10.el7.x86_64                  8/10
  Verifying  : policycoreutils-python-2.2.5-20.el7.x86_64      9/10
  Verifying  : corosynclib-2.3.4-7.el7.x86_64                 10/10

Installed:
  pacemaker.x86_64 0:1.1.13-10.el7                  pcs.x86_64 0:0.9.143-15.el7        
policycoreutils-python.x86_64 0:2.2.5-20.el7        psmisc.x86_64 0:22.20-9.el7

Dependency Installed:
  corosync.x86_64 0:2.3.4-7.el7          corosynclib.x86_64 0:2.3.4-7.el7       
  pacemaker-cli.x86_64 0:1.1.13-10.el7   pacemaker-cluster-libs.x86_64 0:1.1.13-10.el7
  pacemaker-libs.x86_64 0:1.1.13-10.el7  resource-agents.x86_64 0:3.9.5-54.el7

Complete!
[root@UA-HA ~]#

 

We have successfully installed the cluster packages.

 

Note: crmsh is not available in RHEL 7 which is alternative to pcs commands.

 

In My cluster environment, I have disabled the firewall & selinux to avoid the complexity .

[root@UA-HA ~]# setenforce 0
setenforce: SELinux is disabled
[root@UA-HA ~]#
[root@UA-HA ~]# cat /etc/selinux/config |grep SELINUX |grep -v "#"
SELINUX=disabled
SELINUXTYPE=targeted
[root@UA-HA ~]#
[root@UA-HA ~]# systemctl stop firewalld.service
[root@UA-HA ~]# systemctl disable firewalld.service
[root@UA-HA ~]# iptables --flush
[root@UA-HA ~]# 

 

Hope this article is informative to you. In the next article, we will see that how to configure the cluster using pacemaker.

Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Installing Redhat Cluster Software (Corosync/pacemaker) – Part 3 appeared first on UnixArena.

RHEL 7 – Configuring Pacemaker/Corosync – Redhat Cluster – Part 4

$
0
0

In this article, we will see that how to configure two node Redhat cluster using pacemaker & corosync on REHL 7.2. Once you have installed the necessary packages, you need to enable the cluster services at the system start-up. You must start the necessary cluster services before kicking off the cluster configuration. “hacluster” user will be created automatically during the package installation with disabled password. Corosync will use this user to sync the cluster configuration,  starting and stopping the cluster on cluster nodes.

 

Environment:

  • Operating System: Redhat Enterprise Linux 7.2
  • Type of Cluster :  Two Node cluster – Failover
  • Nodes: UA-HA & UA-HA2  (Assuming that packages have been installed on both the nodes)
  • Cluster Resource : KVM guest (VirtualDomain)  –  See in Next Article.

 

Hardware configuration: 

  1. CPU – 2
  2. Memory – 4GB
  3. NFS – For shared storage

 

Redhat Cluster 7 - RHEL 7 - PCS
Redhat Cluster 7 – RHEL 7 – PCS

 

Enable & Start  the Services on both the Nodes:

 

1.Login to both the cluster nodes as root user.

2. Enable the pcsd daemon on both the nodes to start automatically across the reboot. pcsd is pacemaker configuration daemon. (Not a cluster service)

[root@UA-HA ~]# systemctl start pcsd.service
[root@UA-HA ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@UA-HA ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:22:08 EST; 14s ago
 Main PID: 18411 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─18411 /bin/sh /usr/lib/pcsd/pcsd start
           ├─18415 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─18416 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Dec 27 23:22:07 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Dec 27 23:22:08 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

 

3. Set the new password for cluster user “hacluster” on both the nodes.

[root@UA-HA ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA ~]#
[root@UA-HA2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA2 ~]#


Configure corosync & Create new cluster:

 

1. Login to any of the cluster node and authenticate “hacluster” user.

[root@UA-HA ~]# pcs cluster auth UA-HA UA-HA2
Username: hacluster
Password:
UA-HA: Authorized
UA-HA2: Authorized
[root@UA-HA ~]#

 

2.Create a new cluster using pcs command.

[root@UA-HA ~]# pcs cluster setup --name UABLR UA-HA UA-HA2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
UA-HA: Succeeded
UA-HA2: Succeeded
Synchronizing pcsd certificates on nodes UA-HA, UA-HA2...
UA-HA: Success
UA-HA2: Success

Restaring pcsd on the nodes in order to reload the certificates...
UA-HA: Success
UA-HA2: Success
[root@UA-HA ~]#

 

3. Check the cluster status .

[root@UA-HA ~]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA ~]#

You see the error because , cluster service is not started.

 

4. Start the cluster using pcs command. “–all” will start the cluster on all the configured nodes.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

 

In the back-end , “pcs cluster start” command will trigger the following command on each cluster node.

# systemctl start corosync.service
# systemctl start pacemaker.service

 

5. Check the cluster services status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:31 EST; 11s ago
  Process: 18994 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 19001 (corosync)
   CGroup: /system.slice/corosync.service
           └─19001 corosync

Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[1]: 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [TOTEM ] A new membership (192.168.203.131:1464) was formed. Members joined: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] This node is within the primary component and will provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[2]: 2 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA systemd[1]: Started Corosync Cluster Engine.
Dec 27 23:34:31 UA-HA corosync[18994]: Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@UA-HA ~]# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:32 EST; 15s ago
 Main PID: 19016 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─19016 /usr/sbin/pacemakerd -f
           ├─19017 /usr/libexec/pacemaker/cib
           ├─19018 /usr/libexec/pacemaker/stonithd
           ├─19019 /usr/libexec/pacemaker/lrmd
           ├─19020 /usr/libexec/pacemaker/attrd
           ├─19021 /usr/libexec/pacemaker/pengine
           └─19022 /usr/libexec/pacemaker/crmd

Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA[1] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:   notice: Watching for stonith topology changes
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: Notifications disabled
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: The local CRM is operational
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Dec 27 23:34:33 UA-HA attrd[19020]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:33 UA-HA attrd[19020]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:34 UA-HA stonith-ng[19018]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
[root@UA-HA ~]#

 

Verify Corosync configuration:

 

1. Check the corosync communication status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]#

 

In my setup, first RING is using interface “br0”.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        inet6 fe80::84ef:2eff:fee9:260a  prefixlen 64  scopeid 0x20
        ether 00:0c:29:2d:3f:ce  txqueuelen 0  (Ethernet)
        RX packets 15797  bytes 1877460 (1.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7018  bytes 847881 (828.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@UA-HA ~]#

We can have multiple RINGS to provide the redundancy for the cluster communication. (We use to call LLT links in VCS )

 

2. Check the membership and quorum API’s.

[root@UA-HA ~]# corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#
[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

 

Verify Pacemaker Configuration:

 

1. Check the running pacemaker processes.

[root@UA-HA ~]# ps axf |grep pacemaker
19324 pts/0    S+     0:00  |       \_ grep --color=auto pacemaker
19016 ?        Ss     0:00 /usr/sbin/pacemakerd -f
19017 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
19018 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
19019 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
19020 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
19021 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
19022 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd

 

2. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:44:44 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3. You can see that corosync & pacemaker is active now and disabled across the system reboot. If you would like to start the cluster automatically across the reboot, you can enable it using systemctl command.

[root@UA-HA2 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@UA-HA2 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:51:30 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

4. When the cluster starts, it automatically records the number and details of the nodes in the cluster, as well as which stack is being used and the version of Pacemaker being used. To view the cluster configuration (Cluster Information Base – CIB) in XML format, use the following command.

[root@UA-HA2 ~]# pcs cluster cib

 

5. Verify the cluster information base using the following command.

[root@UA-HA ~]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@UA-HA ~]#

By default pacemaker enables STONITH (Shoot The Other Node In The Head ) / Fencing in an order to protect the data. Fencing is mandatory when you use the shared storage to avoid the data corruptions.

For time being , we will disable the STONITH and configure it later.

 

6. Disable the STONITH (Fencing)

[root@UA-HA ~]#pcs property set stonith-enabled=false
[root@UA-HA ~]# 
[root@UA-HA ~]#  pcs property show stonith-enabled
Cluster Properties:
 stonith-enabled: false
[root@UA-HA ~]#

 

7. Verify the cluster configuration again. Hope the errors will be disappear

[root@UA-HA ~]# crm_verify -L -V
[root@UA-HA ~]#

 

We have successfully configured two node redhat cluster  on RHEL 7.2 with new components pacemaker and corosync.  Hope this article is informative to you.

Share it ! Comment it !! Be Sociable !!!

 

The post RHEL 7 – Configuring Pacemaker/Corosync – Redhat Cluster – Part 4 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Resource Agents Overview – Part 5

$
0
0

Resource agents plays an important role in cluster management.  Resource agents are multi-threaded processes that provides the logic to manage  the resources. Pacemaker has one agent per resource type. Resource type could be a File-system , IP address , databases, virtual-domain and more. Resource agent is responsible to monitor, start , stop,validate , migrate , promote and demote  the cluster resources whenever required. Most of the resource agents are compliant  to Open Cluster Framework (OCF) .  Let’s add one IP resource to the existing cluster and then we will get in to the detailed explanation of command options. 

 

1. Login to one of the Redhat Cluster (Pacemaker/corosync) cluster node as root user.

 

2. Check the cluster status .

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 13:06:01 2015          Last change: Sun Dec 27 23:59:59 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3. Add the IP which needs to be high-available (Clustered IP).

[root@UA-HA ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.203.190 cidr_netmask=24 op monitor interval=30s
[root@UA-HA ~]#

ClusterIP – Resource Name(You can give any name)
ocf:heartbeat:IPaddr2 – Resource agent Name.

 

Resource Standard:

The first field (ocf in this case) is the standard to which the resource script conforms and where to find it.
To obtain a list of the available resource standards , use the following command.

[root@UA-HA ~]# pcs resource standards
ocf   - Open cluster Framework 
lsb   - Linux standard base (legacy init scripts)
service - Based on Linux "service" command. 
systemd  - systemd based service Management
stonith  - Fencing Resource standard. 
[root@UA-HA ~]#

 

Resource Provides:

The second field (heartbeat in this case) is standard-specific; for OCF resources, it tells the cluster which OCF namespace the resource script is in. To obtain a list of the available OCF resource providers, use the following command.

[root@UA-HA ~]# pcs resource providers
heartbeat
openstack
pacemaker
[root@UA-HA ~]#

 

What are the pre-built resource agents available in RHEL 7.2  ? 

The third field (IPaddr2 in this case) is the name of the resource script. To see all the resource agents available for a specific OCF provider (heartbeat) , use the following command.

[root@UA-HA ~]# pcs resource agents ocf:heartbeat
CTDB
Delay
Dummy
Filesystem
IPaddr
IPaddr2
IPsrcaddr
LVM
MailTo
Route
SendArp
Squid
VirtualDomain
Xinetd
apache
clvm
conntrackd
db2
dhcpd
docker
ethmonitor
exportfs
galera
iSCSILogicalUnit
iSCSITarget
iface-vlan
mysql
named
nfsnotify
nfsserver
nginx
oracle
oralsnr
pgsql
postfix
rabbitmq-cluster
redis
rsyncd
slapd
symlink
tomcat
[root@UA-HA ~]# pcs resource agents ocf:heartbeat |wc -l
41
[root@UA-HA ~]#

 

For Openstack , you have following resources agents.

[root@UA-HA ~]# pcs resource agents ocf:openstack
NovaCompute
NovaEvacuate
[root@UA-HA ~]#

 

Here is the list resource agents to manager the pacemaker components.

[root@UA-HA ~]# pcs resource agents ocf:pacemaker
ClusterMon
Dummy
HealthCPU
HealthSMART
Stateful
SysInfo
SystemHealth
controld
ping
pingd
remote
[root@UA-HA ~]#

 

4.Verify the resource status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 13:07:33 2015          Last change: Mon Dec 28 13:07:30 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

 

As per the cluster status , IP resource is online on node “UA-HA” . Let’s verify from OS command line.

[root@UA-HA ~]# ip a |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 192.168.203.134/24 brd 192.168.203.255 scope global dynamic br0
    inet 192.168.203.190/24 brd 192.168.203.255 scope global secondary br0
    inet6 fe80::84ef:2eff:fee9:260a/64 scope link
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
[root@UA-HA ~]#
[root@UA-HA ~]# ping 192.168.203.190
PING 192.168.203.190 (192.168.203.190) 56(84) bytes of data.
64 bytes from 192.168.203.190: icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from 192.168.203.190: icmp_seq=2 ttl=64 time=0.090 ms
64 bytes from 192.168.203.190: icmp_seq=3 ttl=64 time=0.121 ms
64 bytes from 192.168.203.190: icmp_seq=4 ttl=64 time=0.094 ms
^C
--- 192.168.203.190 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 0.084/0.097/0.121/0.015 ms
[root@UA-HA ~]#

 

We can see that IP “192.168.203.190/24” is up & running. This IP will  automatically move from one node to another node if the system fails.

The post RHEL 7 – Pacemaker – Cluster Resource Agents Overview – Part 5 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Resources/Group Management – Part 6

$
0
0

In Pacemaker/Corosync  cluster (RHEL 7 HA),  resources management and resource group management are important tasks . Depends on the cluster HA services, you might need to configure N-number of resources. In most of the cases , you might need to start set of resources sequentially, and stop in the reverse order.  To simplify this configuration, Pacemaker supports the concept of groups (Resource groups). For an example, to provide the web-services in HA model, you need resources like , File system(To store website data) , IP (Clustered IP to access website) and Apache (To provide the web-services) . To start the Apache service , you need a filesystem which stores the website data. So the resources must start in the following order ,

  1. IP
  2. File-system
  3. Apache service

 

Let’s see that how to configure the Highly available Apache service (website) in Redhat cluster (Pacemaker/Corosync).  In the previous article, we have already created the IP resource.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 18:24:10 2015          Last change: Mon Dec 28 18:09:30 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]# pcs resource show ClusterIP
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.203.190 cidr_netmask=24
  Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
              monitor interval=30s (ClusterIP-monitor-interval-30s)
[root@UA-HA ~]#

 

Create the File-system and  Apache resources quickly:

 

Filesystem : 

  • Shared LUN – /dev/sdc
  • Volume Group – webvg
  • Volume – webvol1
  • Filesystem Type – ext4

 

Quick Setup for Filesystem resource: 

[root@UA-HA2 ~]# vgcreate webvg /dev/sdc
[root@UA-HA2 ~]# lvcreate -L 90M -n /dev/webvg/webvol1
[root@UA-HA2 ~]# mkfs.ext4 /dev/webvg/webvol1

 

Apache:

  • httpd

Quick Setup:

[root@UA-HA www]# yum install -y httpd

 

Pre-prerequisites for LVM :

(Perform the following changes on both the cluster nodes)

1.Make sure that “use_lvmetad” parameter is set to “0”. This is mandatory when you use “Pacemaker”.

[root@UA-HA ~]# grep use_lvmetad /etc/lvm/lvm.conf |grep -v "#"
    use_lvmetad = 0
[root@UA-HA ~]#

 

2.To prevent the automatic volume group activation, update the volume_list parameter with local VG’s which needs to be activated automatically.

[root@UA-HA ~]# grep volume_list /etc/lvm/lvm.conf |grep -v "#"
        volume_list = [ "nfsvg", "rhel" ]
[root@UA-HA ~]# vgs
  VG    #PV #LV #SN Attr   VSize  VFree
  nfsvg   2   1   0 wz--n-  1.94g 184.00m
  rhel    1   2   0 wz--n- 19.51g      0
  webvg   1   1   0 wz--n- 92.00m      0
[root@UA-HA ~]#

In My case, “webvg” will be managed through cluster.

 

3. Mount the volume in “/var/www”  and create the following directories and files.

[root@UA-HA2 ~]# mount /dev/webvg/webvol1 /var/www
[root@UA-HA2 ~]# cd /var/www
[root@UA-HA2 www]# mkdir errror html cgi-bin
total 3
drwxr-xr-x 2 root root 1024 Dec 28 20:26 cgi-bin
drwxr-xr-x 2 root root 1024 Dec 28 20:26 errror
drwxr-xr-x 2 root root 1024 Dec 28 20:27 html
[root@UA-HA2 www]# cd html/
[root@UA-HA2 html]# vi index.html
Hello, Welcome to UnixArena 

[root@UA-HA2 html]#

 

3.Rebuild the “initramfs” boot image to guarantee that the boot image will not try to activate a volume group controlled by the cluster. Update the initramfs device using the following command.

[root@UA-HA ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
[root@UA-HA ~]#

 

4. Reboot the nodes.

 

 

Create the LVM Cluster resources (vg & lv ), File-system cluster Resources:

 

1.Create the cluster volume group resource.

[root@UA-HA ~]# pcs resource create vgres LVM volgrpname=webvg exclusive=true
[root@UA-HA ~]# pcs resource show vgres
 Resource: vgres (class=ocf provider=heartbeat type=LVM)
  Attributes: volgrpname=webvg exclusive=true
  Operations: start interval=0s timeout=30 (vgres-start-interval-0s)
              stop interval=0s timeout=30 (vgres-stop-interval-0s)
              monitor interval=10 timeout=30 (vgres-monitor-interval-10)
[root@UA-HA ~]#

vgresResource Name (Any Unique Name)
webvgVolume Group

 

2. Create the cluster mount resource.

[root@UA-HA ~]# pcs resource create webvolfs Filesystem  device="/dev/webvg/webvol1" directory="/var/www" fstype="ext4"
[root@UA-HA ~]# pcs resource show webvolfs
 Resource: webvolfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/webvg/webvol1 directory=/var/www fstype=ext4
  Meta Attrs: 
  Operations: start interval=0s timeout=60 (webvolfs-start-interval-0s)
              stop interval=0s timeout=60 (webvolfs-stop-interval-0s)
              monitor interval=20 timeout=40 (webvolfs-monitor-interval-20)
[root@UA-HA ~]#

 

3. Before adding the resource, you must update the local /etc/httpd/conf/httpd.conf with following contents. This entries required for pacemaker to get the web-server status .

Update apache conf
Update apache conf

 

4. Check the apache server status . (httpd.service). Make sure that httpd.service is stopped & disabled on both the cluster nodes. This service will be managed by cluster.

[root@UA-HA ~]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:httpd(8)
           man:apachectl(8)

Dec 27 13:55:52 UA-HA systemd[1]: Starting The Apache HTTP Server...
Dec 27 13:55:55 UA-HA httpd[2002]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.203.134. Set the...is message
Dec 27 13:55:55 UA-HA systemd[1]: Started The Apache HTTP Server.
Dec 27 15:16:02 UA-HA httpd[11786]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.203.134. Set th...is message
Dec 27 15:16:02 UA-HA systemd[1]: Reloaded The Apache HTTP Server.
Dec 28 18:06:57 UA-HA systemd[1]: Started The Apache HTTP Server.
Dec 28 20:30:56 UA-HA systemd[1]: Stopping The Apache HTTP Server...
Dec 28 20:30:57 UA-HA systemd[1]: Stopped The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@UA-HA ~]#

 

3. Create the Apache cluster resource.

[root@UA-HA ~]# pcs resource create webres apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status"
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:11:51 2015          Last change: Mon Dec 28 20:11:44 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 vgres  (ocf::heartbeat:LVM):   (target-role:Stopped) Stopped
 webvolfs       (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started UA-HA2
 webres (ocf::heartbeat:apache):        Stopped

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

In normal cases, resource group will be created (by specifying –group in the end of command line) when you add the first cluster resource to make the dependency tree. To understand the cluster resources and resource group management concept , I am creating the resource group at the end.

 

If you see any resource was started , just stop it to avoid the errors.

[root@UA-HA ~]# pcs resource disable vgres webvolfs webres stop ClusterIP      
[root@UA-HA ~]# pcs resource
 vgres  (ocf::heartbeat:LVM):                    Stopped
 webvolfs       (ocf::heartbeat:Filesystem):     Stopped
 ClusterIP      (ocf::heartbeat:IPaddr2):        Stopped
 webres (ocf::heartbeat:apache):                 Stopped
[root@UA-HA ~]#

 

4. Create the resource group to form the resource dependencies to stop & start in resources in sequence.

[root@UA-HA ~]# pcs resource group add WEBRG1 ClusterIP vgres webvolfs webres

 

As per the above command , here is the resource start up sequence

  1. ClusterIP – Website URL
  2. vgres – Volume Group
  3. webvolfs – Mount Resource
  4. webres – httpd Resource

 

Stop sequence is just reverse to the start.

  1. webres – httpd Resource
  2. webvolfs – Mount Resource
  3. vgres – Volume Group
  4. ClusterIP – Website URL

 

5. Check the resources status. You should be able to see that all the resources are bundled as one resource group with  “WEBRG1” .

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):            Stopped
     webvolfs   (ocf::heartbeat:Filesystem):     Stopped
     webres     (ocf::heartbeat:apache):         Stopped
[root@UA-HA ~]#

 

6. Enable the disabled resources in following sequence.

[root@UA-HA ~]# pcs resource enable ClusterIP
[root@UA-HA ~]# pcs resource enable vgres
[root@UA-HA ~]# pcs resource enable webvolfs
[root@UA-HA ~]# pcs resource enable webres

 

7. Verify the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:54:43 2015          Last change: Mon Dec 28 20:51:30 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

8. Let’s move the resources from UA-HA2 to UA-HA. In this case, we no need to move each resources manually.We just need to move the Resource group since we have bundled the required resource in to that.

[root@UA-HA ~]# pcs resource move WEBRG1 UA-HA
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:58:55 2015          Last change: Mon Dec 28 20:58:41 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

You should be able to see the webpage like following .

Website Portal
Website Portal

 

9. How to stop the pacemaker resource group ? Just disable the resource group.

[root@UA-HA2 ~]# pcs resource disable WEBRG1
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 21:12:18 2015          Last change: Mon Dec 28 21:12:14 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       (target-role:Stopped) Stopped
     vgres      (ocf::heartbeat:LVM):   (target-role:Stopped) Stopped
     webvolfs   (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped
     webres     (ocf::heartbeat:apache):        (target-role:Stopped) Stopped

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

10. How to start the resource group ? Use enable option for the RG.

[root@UA-HA2 ~]# pcs resource enable WEBRG1
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 21:14:04 2015          Last change: Mon Dec 28 21:14:01 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

Note:
Redhat cluster (Pacemaker/corosync) have many parameters like resource stickiness and failure counts.These attributes will play a role that where to start the resources.

 

To clear the errors , use the following command

# pcs resource cleanup 

 

To clear the resource fail counts , use the following command.

 [root@UA-HA2 ~]# pcs resource clear ClusterIP
[root@UA-HA2 ~]# pcs resource clear vgres
[root@UA-HA2 ~]# pcs resource clear webvolfs
[root@UA-HA2 ~]# pcs resource clear webres
[root@UA-HA2 ~]#

 

Hope this article is informative to you.

 

Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Cluster Resources/Group Management – Part 6 appeared first on UnixArena.

RHEL 7 – Pacemaker – Configuring HA KVM guest – Part 7

$
0
0

If you have followed the KVM article series in UnixArena , you might have read the article which talks about the KVM guest live migration. KVM supports the Guest Live migration (similar to VMware vMotion) but to provide high availability , you need need a cluster setup . (Like VMware HA).  In this article ,we will configure the KVM guest as cluster resource with live migration support. If you move the KVM guest resource manually , cluster will perform the live migration and if any hardware failure or hypervisor failure happens on KVM host, guest will be started on available cluster node (with minimal downtime). I will be using the existing KVM  and redhat cluster setup  to demonstrate this.

 

  • KVM Hyper-visor – RHEL 7.2
  • Redhat cluster Nodes – UA-HA & UA-HA2
  • Shared storage – NFS   (As a alternative , you can also use GFS2 )
  • KVM guest – UAKVM2

 

HA KVM guest using Pacemaker
HA KVM guest using Pacemaker

 

1. Login to one of the cluster node and halt the KVM guest.

[root@UA-HA ~]# virsh shutdown UAKVM2
[root@UA-HA ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     UAKVM2                         shut off

[root@UA-HA ~]#

 

2.Copy the Guest domain configuration file (XML) to NFS path.

[root@UA-HA qemu_config]# cd /etc/libvirt/qemu/
[root@UA-HA qemu]# ls -lrt
total 8
drwx------. 3 root root   40 Dec 14 09:13 networks
drwxr-xr-x. 2 root root    6 Dec 16 16:16 autostart
-rw-------  1 root root 3676 Dec 23 02:52 UAKVM2.xml
[root@UA-HA qemu]#
[root@UA-HA qemu]# cp UAKVM2.xml /kvmpool/qemu_config
[root@UA-HA qemu]# ls -lrt /kvmpool/qemu_config
total 4
-rw------- 1 root root 3676 Dec 23 08:14 UAKVM2.xml
[root@UA-HA qemu]#

 

3. Un-define the KVM virtual guest. (To configure as cluster resource)

[root@UA-HA qemu]# virsh undefine UAKVM2
Domain UAKVM2 has been undefined

[root@UA-HA qemu]# virsh list --all
 Id    Name                           State
----------------------------------------------------

[root@UA-HA qemu]#

 

4. Check the pacemaker cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:44:59 2015          Last change: Mon Dec 28 21:16:56 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

5. To manage the KVM guest, you need to use resource agent called “VirtualDomain”. Let’s create a new virtual domain using the UAKVM2.xml file where we have stored in /kvmpool/qemu_config.

[root@UA-HA ~]# pcs resource create UAKVM2_res VirtualDomain hypervisor="qemu:///system" config="/kvmpool/qemu_config/UAKVM2.xml" migration_transport=ssh op start timeout="120s" op stop timeout="120s" op monitor  timeout="30" interval="10"  meta allow-migrate="true" priority="100" op migrate_from interval="0" timeout="120s" op migrate_to interval="0" timeout="120" --group UAKVM2
[root@UA-HA ~]#

 

6. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:51:36 2015          Last change: Mon Dec 28 22:51:36 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

7. KVM guest “UAKVM2” must be created and started automatically. Check the running VM using following command.

[root@UA-HA ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 2     UAKVM2                         running

[root@UA-HA ~]#

 

8. Pacemaker also support the live KVM guest migration. To migrate the KVM guest to other KVM host on fly, use the following command.

[root@UA-HA ~]# pcs resource move UAKVM2 UA-HA2
[root@UA-HA ~]#

In the above command,

UAKVM2 refers the Resource group name & UA-HA2 refers the cluster node name

 

9. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:54:51 2015          Last change: Mon Dec 28 22:54:38 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

10. List the VM using virsh command. You can see that VM is moved from UA-HA to UA-HA2.

[root@UA-HA ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------

[root@UA-HA ~]# ssh UA-HA2 virsh list
 Id    Name                           State
----------------------------------------------------
 2     UAKVM2                         running

[root@UA-HA ~]#

During this migration , you will not even notice a single packet drop. That’s really cool.

 

Hope this article is informative to you . Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Configuring HA KVM guest – Part 7 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Node Management – Part 8

$
0
0

This article will demonstrates about the Pacemaker/Corosync cluster membership, node management and other  cluster operational tasks. Periodically , you might need to take the cluster node offline to perform the maintenance activities like OS package update/upgrade , hardware replacement/upgrade etc. In  such cases ,you need to put the cluster node in to standby mode to keep the cluster operational on other node to avoid the voting issue (In-case of two node cluster).  The cluster stand-by option is persistent across the cluster node reboot. So we no need to bother about the automatic resource start-up until we make the node as un-standby.

In the last section ,  we will see about the cluster maintenance mode which is completely different from the node standby & un-standby operations.  Cluster Maintenance is a preferred method if you are doing the online changes on the cluster nodes.

Pre-configured resources are vgres (LVM – volume group), webvolfs (Logical volume) , ClusterIP (HA IP address for website) , webres (Apache) and UAKVM2_res (HA KVM Guest ).

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Cluster nodes are UA-HA & UA-HA2.

[root@UA-HA ~]# pcs cluster status
Cluster Status:
 Last updated: Sat Oct 17 11:58:23 2015         Last change: Sat Oct 17 11:57:48 2015 by root via crm_attribute on UA-HA
 Stack: corosync
 Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
 2 nodes and 5 resources configured
 Online: [ UA-HA UA-HA2 ]

PCSD Status:
  UA-HA: Online
  UA-HA2: Online
[root@UA-HA ~]#

 

Move a Cluster node in to the Standby Mode:

1. Login to one of the cluster node with root user and check node status.

[root@UA-HA ~]# pcs status nodes
Pacemaker Nodes:
 Online: UA-HA UA-HA2
 Standby:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Offline:
[root@UA-HA ~]#

 

2. Verify the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:00:35 2015          Last change: Sat Oct 17 11:57:48 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3.You can also use the crm_mon to monitor the cluster status in real time.

[root@UA-HA ~]# crm_mon
Last updated: Sat Oct 17 12:05:50 2015          Last change: Sat Oct 17 12:04:28 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

 

To terminate the crm_mon, press control+c.

[root@UA-HA ~]# crm_mon
Connection to the CIB terminated
[root@UA-HA ~]#

 

4. To move the specific node in to standby mode , use the following command.

[root@UA-HA ~]# pcs cluster standby UA-HA2
[root@UA-HA ~]#

 

Check the cluster status again,

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:09:35 2015          Last change: Sat Oct 17 12:09:23 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Node UA-HA2: standby
Online: [ UA-HA ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

You can see that Resource Group “UAKVM2” is automatically moved from UA-HA2 to UA-HA. You can perform the maintenance activity on UA-HA2 without worrying about the cluster membership and automatic resource start-up.

 

5. Check the cluster membership status. (Quorum status).

[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

OR

[root@UA-HA ~]# corosync-quorumtool
Quorum information
------------------
Date:             Sat Oct 17 12:15:54 2015
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          2296
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

Even though node UA-HA2 is standby mode, it still provides the vote to the cluster. If you have halted the node “UA-HA2” for maintenance activity, quorum status will change like below.

[root@UA-HA ~]# corosync-quorumtool
Quorum information
------------------
Date:             Sat Oct 17 12:16:25 2015
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          2300
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

Clear the Standby Mode:

1. Once the maintenance is completed for UA-HA2 , just make it as un-standby to make the cluster node available for operation.

[root@UA-HA ~]# pcs cluster unstandby UA-HA2
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:29:21 2015          Last change: Sat Oct 17 12:29:19 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

2.You could move the desired resource group to UA-HA2 .

[root@UA-HA ~]# pcs resource move UAKVM2 UA-HA2
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:32:05 2015          Last change: Sat Oct 17 12:29:19 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

We have successfully put the node “UA-HA2” in to the maintenance mode and revert it back.

 

How to stop/start the cluster services on specific node ?

1.Check the cluster status.

[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 16:53:02 2015          Last change: Sat Oct 17 16:52:21 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

2.Let’s plan to stop the cluster services on UA-HA. As per the cluster status, group “UAKVM2” is running on UA-HA.

 

3.Stop the cluster services on UA-HA and let’s see what happens to the group. From UA-HA node, execute the following command.

[root@UA-HA log]# pcs cluster stop
Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...
[root@UA-HA log]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA log]#

 

Since pcsd daemon is stopped, you can’t check the cluster status from UA-HA. Let’s check from UA-HA2 node.

[root@UA-HA log]# ssh UA-HA2 pcs status
Cluster name: UABLR
Last updated: Sun Jan 10 12:13:52 2016          Last change: Sun Jan 10 12:05:47 2016 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA2 ]
OFFLINE: [ UA-HA ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

Group “UAKVM2”  has been automatically moved to UA-HA2. What happens , if you start the cluster services on UA-HA ?

[root@UA-HA log]# pcs cluster start
Starting Cluster...
[root@UA-HA log]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 17:03:45 2015          Last change: Sun Jan 10 12:05:47 2016 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

Group UAKVM2 is automatically move back to UA-HA.

 

If you do not want to move the resource group automatically ,

1. “BAN” the resource group  in which you would like to stop the cluster services.

[root@UA-HA log]# pcs resource ban UAKVM2 UA-HA
Warning: Creating location constraint cli-ban-UAKVM2-on-UA-HA with a score of -INFINITY for resource UAKVM2 on node UA-HA.
This will prevent UAKVM2 from running on UA-HA until the constraint is removed. This will be the case even if UA-HA is the last node in the cluster.

 

2. Resource group will be automatically moved to other nodes in the cluster.

[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 17:18:25 2015          Last change: Sat Oct 17 17:17:48 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

3. Cluster creates a constraints to prevent the group starting from the specific node.

[root@UA-HA log]# pcs constraint
Location Constraints:
  Resource: UAKVM2
    Disabled on: UA-HA (score:-INFINITY) (role: Started)
Ordering Constraints:
Colocation Constraints:

 

4. stop the cluster service. (If you want to stop the cluster service on the specific node).

5. Start the cluster service.

6. Move the resource group back to the system on desired time.

 

Cluster Maintenance Mode: (Online)

If you would like to perform the software upgrades and configuration changes which impacts the cluster resources, you need to make the cluster in to maintenance mode . So that all the resources will be tagged as un-managed by pacemaker. Which means , Pacemaker monitoring will be turned off and no action will be taken by cluster until you remove the maintenance mode. This is one of the useful feature to upgrade the cluster components and perform the other resource changes.

1. To move the cluster in to maintenance mode, use the following command.

[root@UA-HA ~]# pcs property set maintenance-mode=true

 

2. Check the Cluster Property

[root@UA-HA ~]# pcs property list
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: UABLR
dc-version: 1.1.13-10.el7-44eb2dd
have-watchdog: false
last-lrm-refresh: 1452507397
maintenance-mode: true
stonith-enabled: false

 

3. Check the cluster status. Resources are set to unmanaged Flag.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sun Oct 18 12:19:33 2015 Last change: Sun Oct 18 12:19:27 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

Resource Group: WEBRG1
vgres (ocf::heartbeat:LVM): Started UA-HA2 (unmanaged)
webvolfs (ocf::heartbeat:Filesystem): Started UA-HA2 (unmanaged)
ClusterIP (ocf::heartbeat:IPaddr2): Started UA-HA2 (unmanaged)
webres (ocf::heartbeat:apache): Started UA-HA2 (unmanaged)
Resource Group: UAKVM2
UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA (unmanaged)

PCSD Status:
UA-HA: Online
UA-HA2: Online

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@UA-HA ~]#

 

4. Resources are continuous to run even though you have stopped the cluster services.

[root@UA-HA ~]# pcs cluster stop --all
UA-HA: Stopping Cluster (pacemaker)...
UA-HA2: Stopping Cluster (pacemaker)...
UA-HA2: Stopping Cluster (corosync)...
UA-HA: Stopping Cluster (corosync)...
[root@UA-HA ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 55    UAKVM2                         running

[root@UA-HA ~]#

Perform the maintenance activity which can be done without rebooting the system.

 

5. Start the cluster services.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

 

6. Resource should still show as unmanaged & online.

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2 (unmanaged)
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2 (unmanaged)
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2 (unmanaged)
     webres     (ocf::heartbeat:apache):        Started UA-HA2 (unmanaged)
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA (unmanaged)

 

7. Clear the Maintenance mode.

[root@UA-HA ~]# pcs property set maintenance-mode=flase

OR

[root@UA-HA ~]# pcs property unset maintenance-mode

 

8. Verify the resource status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sun Oct 18 12:41:59 2015          Last change: Sun Oct 18 12:41:51 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

Hope this article is informative to you. Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Cluster Node Management – Part 8 appeared first on UnixArena.

RHEL 7 – Pacemaker – Configure Redundant Corosync Links on Fly– Part 10

$
0
0

Corosync cluster engine provides the reliable inter-cluster communications between the cluster nodes. It syncs the cluster configuration across the cluster nodes all the time. It also maintains the cluster membership and notifies when quorum is achieved or lost. It provides the messaging layer inside the cluster to manage the system and resource availability. In Veritas cluster , this functionality has been provided by LLT + GAB (Low latency transport + Global Atomic Broadcast) . Unlike veritas cluster, Corosync uses the existing network interface to communicate with cluster nodes.

 

Why do we need  redundant corosync Links ?

By default ,we configure the network bonding by aggregating couple of physical network interfaces for primary node IP.  Corosync will use this interface as heartbeat link in default configurations. If there is an issue with network and lost the network connectivity between two nodes , cluster might need to face the split brain situation. To avoid split brain , we are configuring additional network links. This network link should be configured with different network switch or we can use the direct network cable between two nodes.

Note: For tutorial simplicity , we will use unicast (Not Multicast) for corosync.  Unicast method should be fine for two node clusters.

 

Configuring the additional corosync links is an online activity and can be done without impacting the services.

 

Let’s explore the existing configuration:

1. View the corosync configuration using pcs command.

[root@UA-HA ~]# pcs cluster corosync
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
}

nodelist {
    node {
        ring0_addr: UA-HA
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

[root@UA-HA ~]#

 

2. Corosync uses two UDP ports mcastport (for mcast receives) and mcastport – 1 (for mcast sends).

  • mcast receives: 5405
  • mcast sends: 5404
[root@UA-HA ~]# netstat -plantu | grep 54 |grep corosync
udp        0      0 192.168.203.134:5405    0.0.0.0:*                           34363/corosync
[root@UA-HA ~]#

 

3. Corosync configuration file is located in /etc/corosync.

[root@UA-HA ~]# cat /etc/corosync/corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
}

nodelist {
    node {
        ring0_addr: UA-HA
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}
[root@UA-HA ~]#

 

4. Verify current ring Status using corosync-cfgtool.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]# ssh UA-HA2 corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 192.168.203.131
        status  = ring 0 active with no faults
[root@UA-HA ~]#

 

As we can see that only one ring has been configured for corosync and it uses the following interfaces from each node.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        

[root@UA-HA ~]# ssh UA-HA2 ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.131  netmask 255.255.255.0  broadcast 192.168.203.255
        
[root@UA-HA ~]#

 

Configure a new ring :

 

5. To add additional redundancy for corosync links, we will use the following interface on both nodes.

[root@UA-HA ~]# ifconfig eno33554984
eno33554984: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.3  netmask 255.255.255.0  broadcast 172.16.0.255
        
[root@UA-HA ~]# ssh UA-HA2 ifconfig eno33554984
eno33554984: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.2  netmask 255.255.255.0  broadcast 172.16.0.255
       
[root@UA-HA ~]#

Dedicated Private address for Corosync Links:
172.16.0.3 – UA-HA-HB2
172.16.0.2 – UA-HA2-HB2

 

6. Before making changes in corosync configuration, we need to move the cluster in to maintenance mode.

[root@UA-HA ~]# pcs property set maintenance-mode=true
[root@UA-HA ~]# pcs property show maintenance-mode
Cluster Properties:
 maintenance-mode: true
[root@UA-HA ~]#

 

This will eventually puts the resources in unmanaged state.

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA (unmanaged)
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA (unmanaged)
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA (unmanaged)
     webres     (ocf::heartbeat:apache):        Started UA-HA (unmanaged)
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2 (unmanaged)
[root@UA-HA ~]#

 

7. Update the /etc/hosts with following entries on both the nodes.

[root@UA-HA corosync]# cat /etc/hosts |grep HB2
172.16.0.3     UA-HA-HB2
172.16.0.2     UA-HA2-HB2
[root@UA-HA corosync]#

 

8. Update the corosync.conf with rrp_mode & ring1_addr.

[root@UA-HA corosync]# cat corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
    rrp_mode: active
}

nodelist {
    node {
        ring0_addr: UA-HA
        ring1_addr: UA-HA-HB2
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        ring1_addr: UA-HA2-HB2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}
[root@UA-HA corosync]#

 

Here is the difference between previous configuration file vs New one.

[root@UA-HA corosync]# sdiff -s corosync.conf corosync.conf_back
   rrp_mode: active                                           <
        ring1_addr: UA-HA-HB2                                 <
        ring1_addr: UA-HA2-HB2                                <
[root@UA-HA corosync]#

 

9. Restart the corosync services on both the nodes.

[root@UA-HA ~]# systemctl restart corosync
[root@UA-HA ~]# ssh UA-HA2 systemctl restart corosync

 

10. Check the corosync service status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2015-10-19 02:38:16 EDT; 16s ago
  Process: 36462 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 36470 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 36477 (corosync)
   CGroup: /system.slice/corosync.service
           └─36477 corosync

Oct 19 02:38:15 UA-HA corosync[36477]:  [QUORUM] Members[2]: 2 1
Oct 19 02:38:15 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 02:38:16 UA-HA systemd[1]: Started Corosync Cluster Engine.
Oct 19 02:38:16 UA-HA corosync[36470]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Oct 19 02:38:24 UA-HA corosync[36477]:  [TOTEM ] A new membership (192.168.203.134:3244) was formed. Members left: 2
Oct 19 02:38:24 UA-HA corosync[36477]:  [QUORUM] Members[1]: 1
Oct 19 02:38:24 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 02:38:25 UA-HA corosync[36477]:  [TOTEM ] A new membership (192.168.203.131:3248) was formed. Members joined: 2
Oct 19 02:38:26 UA-HA corosync[36477]:  [QUORUM] Members[2]: 2 1
Oct 19 02:38:26 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
[root@UA-HA ~]#

 

11. Verify the corosync configuration using pcs command.

[root@UA-HA ~]# pcs cluster corosync
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
   rrp_mode: active
}

nodelist {
    node {
        ring0_addr: UA-HA
        ring1_addr: UA-HA-HB2
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        ring1_addr: UA-HA2-HB2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

[root@UA-HA ~]#

 

12.Verify the ring status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]# ssh UA-HA2 corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 192.168.203.131
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.2
        status  = ring 1 active with no faults
[root@UA-HA ~]#

 

You could also check the ring status using following command.

[root@UA-HA ~]# corosync-cmapctl |grep member
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134) r(1) ip(172.16.0.3)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131) r(1) ip(172.16.0.2)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#

We have successfully configured redundant rings  for corosync .

 

13. Clear the cluster maintenance mode.

[root@UA-HA ~]# pcs property unset maintenance-mode

or 

[root@UA-HA ~]#  pcs property set maintenance-mode=false

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Let’s break it !!

You could easily test the rrp_mode by pulling out the network cable from one of the configured interface. I have just used “ifconfig br0 down” command to simulate this test on UA-HA2 node. Assuming that application/DB is using different interface.

[root@UA-HA ~]# ping UA-HA2
PING UA-HA2 (192.168.203.131) 56(84) bytes of data.
^C
--- UA-HA2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1002ms

[root@UA-HA ~]#

 

Check the ring status. We can see that ring 0 has been marked as faulty.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = Marking ringid 0 interface 192.168.203.134 FAULTY
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]#

 

You could see that cluster is running perfectly without any issue.

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Bring up the br0 interface using “ifconfig br0 up”. Ring 0 is back to online.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]#

Hope this article informative to you. Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Configure Redundant Corosync Links on Fly– Part 10 appeared first on UnixArena.


RHEL 7 – Accessing the Pacemaker WEB UI (GUI) – Part 11

$
0
0

Pacemaker offers web based user interface portal to manage the cluster. It also provides an interface to manage multiple clusters in single web UI. We can’t really say that WEB UI has all the options to manage the cluster. I would say that command line is much easier and simple when you compare to GUI. However , you could give a try for pacemaker web UI.  It  uses the port 2224  and you can access the web UI portal using “https://nodename:2224” .

Web UI is limited to perform the following tasks

  • Create the new cluster
  • Add existing cluster to the GUI
  • Manage the cluster nodes (stop , start, standby)
  • Configure the fence devices
  • Configure the cluster resources
  • Resource attributes (order, location, collocation, meta attributes )
  • Set the cluster properties.
  • Create a roles

I don’t see any option to switch over the resources from one node to another node .  Also there is no way to verify & configure the corosync rings.

 

Let’s access the web UI portal of pacemaker.

1. Doesn’t  require any additional setup to access the pacemaker web UI from cluster nodes. By default, pcs packages will be installed as a part of cluster package installation.

 

2.pcsd.service is responsible for  web UI.

[root@UA-HA ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2015-10-19 14:46:06 EDT; 2s ago
 Main PID: 55297 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─55297 /bin/sh /usr/lib/pcsd/pcsd start
           ├─55301 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           ├─55302 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─55315 python2 /usr/lib/pcsd/systemd-notify-fix.py

Oct 19 14:46:01 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Oct 19 14:46:06 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

 

3. pcsd configuration daemon uses the account called “hacluster” . we have setup the password during the initial cluster setup.

 

4.Let’s launch the pacemaker Web UI. You could use any one of the node’s  IP address to access it.

Pacemaker Corosync Web UI
Pacemaker Corosync WEb UI

 

5. Login with “hacluster” user credentials .

 

6. By default, there won’t be any cluster added in to the portal. Since we have a configured cluster, let’s add to this web UI.  Click “+Add Existing” link.

Pacemaker Web UI - Add Cluster
Pacemaker Web UI – Add Cluster

 

7. Enter one of the cluster node IP address and click “Add Existing”.  This process will automatic pull the cluster information to Web UI.

Add the pacemaker cluster to Web UI
Add the pacemaker cluster to Web UI

 

Same way you can add N number of clusters  to the single Web UI.  So that you can manage all the clusters from one place.

 

8. Select the cluster which you would like to manage using Web UI.

Select the cluster
Select the cluster

 

 

9.By default , it will take you to the “Nodes” tab.

Pacemaker Corosync Node status
Pacemaker Corosync Node status

 

Here you could see the following options.

  • Stop/start/restart the cluster services on specific node
  • Move the node in to standby mode.
  • Configure Fencing.

 

10. Have a look at the resource management tab.

Pacemaker Resource Management tab
Pacemaker Resource Management tab

 

 

11. Next tab is exclusively to configure & manage the fencing.

 

12. ACLS  tab provides an option to create the rule with custom rules. (Providing read only access to set of users / group)

 

13.  In “cluster properties” tab, you can find the following options.

Cluster properties
Cluster properties

 

14. The last tab is will take to you  (screen “step 8” )  cluster list.

 

I personally  felt that pacemaker web UI is limited to perform specific work.  The pacemaker (pcs) command line looks simple and powerful .

Hope this article is informative to you. Share it ! Comment it ! Be Sociable !!!

The post RHEL 7 – Accessing the Pacemaker WEB UI (GUI) – Part 11 appeared first on UnixArena.

RHEL 7 – How to configure the Fencing on Pacemaker ?

$
0
0

Fencing (STONITH) is an important mechanism in cluster to avoid the data corruption on shared storage. It also helps to bring the cluster into the known state when there is a split brain occurs between the nodes. Cluster nodes talks to each other over communication channels, which are typically standard network connections, such as Ethernet. Each resources and nodes have “state” (Ex: started , stopped) in the cluster and nodes report every changes that happens on resources. This reporting works well until communication breaks between the nodes. Fencing  will come to play when nodes can’t communicate with each other. Majority of nodes will form the cluster based on quorum votes and rest of the nodes will be rebooted or halted based on fencing actions what we have denied.

 

There are two type of fencing available in pacemaker.

  • Resource Level Fencing
  • Node Level Fencing

Using the resource level fencing, the cluster can make sure that a node cannot access same resources on both the nodes. The node level fencing makes sure that a node does not run any resources at all. This is usually done in a very simple, yet brutal way: the node is simply reset using a power switch. This may ultimately be necessary because the node may not be responsive at all.  In Pacamaker/corosync cluster, we will call the fencing method as “STONITH”  (Shoot The Other Node In The Head).

 

For more information , please visit clusterlabs.org . Here we will see the node level fencing.

 

Have a look at the cluster setup.

[root@Node1-LAB ~]# pcs status
Cluster name: GFSCLUS
Last updated: Wed Jan 20 12:43:36 2016
Last change: Wed Jan 20 09:57:06 2016 via cibadmin on Node1
Stack: corosync
Current DC: Node1 (1) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
2 Resources configured


Online: [ Node1 Node2 ]

PCSD Status:
  Node1: Online
  Node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@Node1-LAB ~]#

 

In this article ,we will see that how to configure stonith/fencing  “fence_xvm”  for KVM cluster nodes. The purpose of this setup is to demonstrate the STONITH/FENCING.

 

Environment:  (Demo Purpose only)

  • Node 1 & Node 2  – Pacemaker/corosync cluster
  • UNIXKB-CP  – KVM host which hosts Node1 & Node2

 

Configure KVM host to use fence_xvm:

1.Login to the KVM host.

2.List the running virtual Machines .

[root@UNIXKB-CP ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 6     Node1                          running
 7     Node2                          running

[root@UNIXKB-CP ~]#

 

3.Install the required fencing packages on KVM host (Non-Cluster node)

[root@UNIXKB-CP ~]# yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial
Loaded plugins: langpacks, product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Package fence-virt-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-libvirt-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-multicast-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-serial-0.3.0-16.el7.x86_64 already installed and latest version
Nothing to do
[root@UNIXKB-CP ~]#

 

4. Create the new directory to store the fence key. Create the random key to use for fencing.

[root@UNIXKB-CP ~]# mkdir -p /etc/cluster
[root@UNIXKB-CP ~]# cd /etc/cluster/
[root@UNIXKB-CP cluster]# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000506736 s, 8.1 MB/s
[root@UNIXKB-CP cluster]#

 

5. Copy the fence keys to cluster nodes. (Node1 & Node2)

[root@UNIXKB-CP cluster]# scp -r /etc/cluster/fence_xvm.key root@Node1:/etc/cluster/fence_xvm.key
root@node1's password:
fence_xvm.key                                                                                                                      100% 4096     4.0KB/s   00:00
[root@UNIXKB-CP cluster]# scp -r /etc/cluster/fence_xvm.key root@Node2:/etc/cluster/fence_xvm.key
root@node2's password:
fence_xvm.key                                                                                                                      100% 4096     4.0KB/s   00:00
[root@UNIXKB-CP cluster]#

Note: You must create a “/etc/cluster” directory on the cluster nodes in an order to copy the xvm keys.

 

6.Use “fence_virtd -c” command to create “/etc/fence_virt.conf” file.

[root@UNIXKB-CP ~]# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:

Available backends:
    libvirt 0.1
Available listeners:
    multicast 1.2

Listener modules are responsible for accepting requests
from fencing clients.

Listener module [multicast]:

The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
multicast.

The multicast address is the address that a client will use to
send fencing requests to fence_virtd.

Multicast IP Address [225.0.0.12]:

Using ipv4 as family.

Multicast IP Port [1229]:

Setting a preferred interface causes fence_virtd to listen only
on that interface.  Normally, it listens on all interfaces.
In environments where the virtual machines are using the host
machine as a gateway, this *must* be set (typically to virbr0).
Set to 'none' for no interface.

Interface [virbr0]: br0:1

The key file is the shared key information which is used to
authenticate fencing requests.  The contents of this file must
be distributed to each physical host and virtual machine within
a cluster.

Key File [/etc/cluster/fence_xvm.key]:

Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.

Backend module [libvirt]:

Configuration complete.

=== Begin Configuration ===
backends {
        libvirt {
                uri = "qemu:///system";
        }

}

listeners {
        multicast {
                port = "1229";
                family = "ipv4";
                interface = "br0:1";
                address = "225.0.0.12";
                key_file = "/etc/cluster/fence_xvm.key";
        }

}

fence_virtd {
        module_path = "/usr/lib64/fence-virt";
        backend = "libvirt";
        listener = "multicast";
}

=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y
[root@UNIXKB-CP ~]#

Make sure that you are proving the correct interface as the bridge. In My setup , I am using br0:1 virtual interface to communicate with KVM guests.

 

7. Start the fence_virtd service.

[root@UNIXKB-CP ~]# systemctl enable fence_virtd.service
[root@UNIXKB-CP ~]# systemctl start fence_virtd.service
[root@UNIXKB-CP ~]# systemctl status fence_virtd.service
fence_virtd.service - Fence-Virt system host daemon
   Loaded: loaded (/usr/lib/systemd/system/fence_virtd.service; enabled)
   Active: active (running) since Wed 2016-01-20 23:36:14 IST; 1s ago
  Process: 3530 ExecStart=/usr/sbin/fence_virtd $FENCE_VIRTD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 3531 (fence_virtd)
   CGroup: /system.slice/fence_virtd.service
           └─3531 /usr/sbin/fence_virtd -w

Jan 20 23:36:14 UNIXKB-CP systemd[1]: Starting Fence-Virt system host daemon...
Jan 20 23:36:14 UNIXKB-CP systemd[1]: Started Fence-Virt system host daemon.
Jan 20 23:36:14 UNIXKB-CP fence_virtd[3531]: fence_virtd starting.  Listener: libvirt  Backend: multicast
[root@UNIXKB-CP ~]#

 

Configure the Fencing on Cluster Nodes:

1.Login to one of the cluster node.

2.Make sure that both the nodes have “fence_virt” package.

[root@Node1-LAB ~]# rpm -qa fence-virt
fence-virt-0.3.0-16.el7.x86_64
[root@Node1-LAB ~]#

 

3. The following commands much be scuesscced in an order to configure the fencing in cluster.

[root@Node1-LAB ~]# fence_xvm -o list
Node1                6daac670-c494-4e02-8d90-96cf900f2be9 on
Node2                17707dcb-7bcc-4b36-9498-a5963d86dc2f on
[root@Node1-LAB ~]#

 

4.Cluster nodes entry must be present in the /etc/hosts.

[root@Node1-LAB ~]# cat /etc/hosts |grep Node
192.168.2.10    Node1-LAB  Node1
192.168.2.11    Node2-LAB  Node2
[root@Node1-LAB ~]#

 

5.Configure fence_xvm fence agent on pacemaker cluster.

[root@Node1-LAB ~]# pcs stonith create xvmfence  fence_xvm key_file=/etc/cluster/fence_xvm.key
[root@Node1-LAB ~]# 
[root@Node1-LAB ~]# pcs stonith
 xvmfence       (stonith:fence_xvm):    Started
[root@Node1-LAB ~]#
[root@Node1-LAB ~]# pcs stonith --full
 Resource: xvmfence (class=stonith type=fence_xvm)
  Attributes: key_file=/etc/cluster/fence_xvm.key
  Operations: monitor interval=60s (xvmfence-monitor-interval-60s)
[root@Node1-LAB ~]#

We have successfully configure the fencing on RHEL 7 – Pacemaker/Corosync cluster. (Cluster has been configured between two KVM guests).

 

Validate the STONITH:

 

How should I test my “stonith” configuration ? Here is the small demonstration.

1. Login to the one of the cluster node.

2. Try to fence on of the node.

[root@Node1-LAB ~]# pcs stonith fence Node2
Node: Node2 fenced
[root@Node1-LAB ~]#

 

This will eventually reboot Node2 . Reboot happens based on cluster property.

[root@Node1-LAB ~]# pcs property --all |grep stonith-action
 stonith-action: reboot
[root@Node1-LAB ~]#

 

Stonith also can be ON/OFF using pcs property command.

[root@Node1-LAB ~]# pcs property --all |grep stonith-enabled
 stonith-enabled: true
[root@Node1-LAB ~]#

 

Hope this article is informative to you.

The post RHEL 7 – How to configure the Fencing on Pacemaker ? appeared first on UnixArena.

Configuring NFS HA using Redhat Cluster – Pacemaker on RHEL 7

$
0
0

This article will help you to setup High Availability NFS server using Pacemaker on Redhat Enterprise Linux 7. From the scratch ,we will build the pacemaker blocks which includes package installation , configuring the HA resources, fencing etc. NFS shares are used for setting up the home directories and sharing the same content across multiple servers. NFS HA will suit for customers who can’t afford NAS storage. You might have followed Pacemaker articles on UnixArena where we have setup Failover KVM VM and GFS earlier. If not , please go through it to understand the various component of pacemaker and how it works. This article is not going to cover in-depth.

 

NFS HA - Pacemaker UnixArena
NFS HA – Pacemaker UnixArena

 

Assumptions:

  • Two servers installed with RHEL 7.x (Hosts- UA-HA1 / UA-HA2)
  • Access to Redhat Repository or Local Repository to install packages.
  • SELINUX & Firewalld can be turned off.

 

1. Login to the each node as root user and install the package .

# yum install pcs fence-agents-all

 

2. Disable SELINUX on both the nodes.

# setenforce 0
setenforce: SELinux is disabled
# cat /etc/selinux/config |grep SELINUX |grep -v "#"
SELINUX=disabled
SELINUXTYPE=targeted

 

3. Disable firewalld on both the hosts

UA-HA# systemctl stop firewalld.service
UA-HA# systemctl disable firewalld.service
UA-HA# iptables --flush
UA-HA#

 

4. Enable and Start the Services on both the Nodes.

# systemctl start pcsd.service
# systemctl enable pcsd.service
# systemctl status pcsd.service

 

5. On each nodes, set the password for hauser.

# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

 

6.Login to any of the cluster node and authenticate “hacluster” user.

# pcs cluster auth UA-HA1 UA-HA2

 

7. Create a new cluster using pcs command. The cluster name is “UACLS”.

# pcs cluster setup --name UACLS UA-HA1 UA-HA2

 

8. Start the cluster using pcs command. “–all” will start the cluster on all the configured nodes.

# pcs cluster start –all

 

9. Check the corosync communication status.This command output should show that which IP has been used for heartbeat. Refer configure-redundant-corosync
# corosync-cfgtool –s

 

10. Disable STONITH to avoid issues while configuring the resources. Once we complete the cluster setup, we will enable the fencing back.

#pcs property set stonith-enabled=false
#pcs property show stonith-enabled

 

10. Configure the fencing (STONITH) using ipmilan.

# pcs stonith create UA-HA1_fen fence_ipmilan pcmk_host_list="UA-HA1" ipaddr=192.168.10.24 login=root  passwd=test123 lanplus=1 cipher=1 op monitor interval=60s
#pcs stonith create UA-HA2_fen fence_ipmilan pcmk_host_list="UA-HA2" ipaddr=192.168.10.25 login=root  passwd=test123 lanplus=1 cipher=1 op monitor interval=60s

These IP’s are IDRAC console IP used for fencing.

 

11. Verify the cluster configuration.

# crm_verify -L –V

 

12. Configure the volume group and Logical volume.

# vgcreate UAVG1 /dev/disk_name1
# vgcreate UAVG2 /dev/disk_#name2

# lvcreate -L sizeM -n /dev/UAVG1/UAVOL1
# lvcreate -L sizeM -n /dev/UAVG2/UAVOL2

 

12. Create a filesystem. (Let’s go with XFS)

#mkfs.xfs /dev/UAVG1/UAVOL1
#mkfs.xfs /dev/UAVG2/UAVOL2

 

13. Modify LVM configuration similar to below. Assuming that all the volume groups are used in cluster. If you have root vg , you need to specify in lvm.conf for automatic import.

# grep use_lvmetad /etc/lvm/lvm.conf |grep -v "#"
use_lvmetad = 0

 

14.Configure symmetric cluster property and check the status.

# pcs property set symmetric-cluster=true
[root@UA-HA1 tmp]# pcs status
Cluster name: UACLS
Stack: corosync
Current DC: UA-HA2 (2) - partition with quorum
2 Nodes configured
2 Resources configured

Online: [ UA-HA1 UA-HA2 ]

Full list of resources:

 UA-HA1_fen   (stonith:fence_ipmilan):        Started UA-HA2
 UA-HA2_fen   (stonith:fence_ipmilan):        Started UA-HA1

PCSD Status:
  UA-HA1: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA1 tmp]# pcs stonith show
 UA-HA1_fen   (stonith:fence_ipmilan):        Started
 UA-HA2_fen   (stonith:fence_ipmilan):        Started
[root@UA-HA1 tmp]#

 

15.Configure VG & Mount resources .

#pcs resource create UAVG1_res LVM volgrpname="UAVG1" exclusive=true  --group UANFSHA
#pcs resource create UAVOL1_res Filesystem  device="/dev/UAVG1/UAVOL1" directory="/cm/shared" fstype="xfs" --group UANFSHA


#pcs resource create UAVG2_res LVM volgrpname="UAVG2" exclusive=true --group  UANFSHA
#pcs resource create UAVOL2_res Filesystem  device="/dev/UAVG2/UAVOL2" directory="/global/home" fstype="xfs" --group UANFSHA

 

16.Configure VIP for NFS share. This IP will be used on NFS client to mount the shares.

# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.2.90  nic="eth0" cidr_netmask=24 op monitor interval=30s --group UANFSHA

 

17. Configure NFS server resources.

[root@UA-HA1 ~]# pcs resource create NFS-D nfsserver nfs_shared_infodir=/global/nfsinfo nfs_ip=192.168.2.90  --group UANFSHA

 

18.Check the cluster status.

[root@UA-HA1 ~]# pcs status
Cluster name: UACLS
Last updated: Tue Aug 16 12:39:22 2016
Last change: Tue Aug 16 12:39:19 2016 via cibadmin on UA-HA1
Stack: corosync
Current DC: UA-HA1 (1) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
8 Resources configured

Online: [ UA-HA1 UA-HA2 ]

Full list of resources:

 UA-HA1_fen   (stonith:fence_ipmilan):        Started UA-HA1
 UA-HA2_fen   (stonith:fence_ipmilan):        Started UA-HA1
 Resource Group: UANFSHA
     UAVG1_res  (ocf::heartbeat:LVM):   Started UA-HA1
     UAVG2_res  (ocf::heartbeat:LVM):   Started UA-HA1
     UAVOL1_res  (ocf::heartbeat:Filesystem):    Started UA-HA1
     UAVOL2_res  (ocf::heartbeat:Filesystem):    Started UA-HA1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA1
     NFS-D      (ocf::heartbeat:nfsserver):     Started UA-HA1 

 

19. Configure the HA NFS shares.

[root@UA-HA1 ~]# pcs resource create nfs-cm-shared exportfs clientspec=192.168.2.0/255.255.255.0 options=rw,sync,no_root_squash directory=/SAP_SOFT fsid=0 --group UANFSHA
[root@UA-HA1 ~]# pcs resource create nfs-global-home  exportfs clientspec=10.248.102.0/255.255.255.0 options=rw,sync,no_root_squash directory=/users1/home fsid=1 --group UANFSHA

 

20.Final cluster status will looks like similar to the following.

[root@UA-HA1 ~]# pcs status
Cluster name: UACLS
Last updated: Tue Aug 16 12:52:43 2016
Last change: Tue Aug 16 12:51:56 2016 via cibadmin on UA-HA1
Stack: corosync
Current DC: UA-HA1 (1) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
10 Resources configured


Online: [ UA-HA1 UA-HA2 ]

Full list of resources:

 UA-HA1_fen   (stonith:fence_ipmilan):        Started UA-HA1
 UA-HA2_fen   (stonith:fence_ipmilan):        Started UA-HA1
 Resource Group: UANFSHA
     UAVG1_res  (ocf::heartbeat:LVM):   Started UA-HA1
     UAVG2_res  (ocf::heartbeat:LVM):   Started UA-HA1
     UAVOL1_res  (ocf::heartbeat:Filesystem):    Started UA-HA1
     UAVOL2_res  (ocf::heartbeat:Filesystem):    Started UA-HA1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA1
     NFS-D      (ocf::heartbeat:nfsserver):     Started UA-HA1
     nfs-cm-shared      (ocf::heartbeat:exportfs):      Started UA-HA1
     nfs-global-home    (ocf::heartbeat:exportfs):      Started UA-HA1

 

21. Configure resource dependencies.

[root@UA-HA1 ~]# pcs constraint order start UAVG1_res then UAVOL1_res
[root@UA-HA1 ~]# pcs constraint order start UAVG2_res then UAVOL2_res
[root@UA-HA1 ~]# pcs constraint order start UAVOL1_res then ClusterIP
[root@UA-HA1 ~]# pcs constraint order start UAVOL2_res then ClusterIP 
[root@UA-HA1 ~]# pcs constraint order start ClusterIP then NFS-D
[root@UA-HA1 ~]# pcs  constraint order start NFS-D then nfs-cm-shared
[root@UA-HA1 ~]# pcs  constraint order start NFS-D then nfs-global-home
[root@UA-HA1 ~]# pcs constraint
Location Constraints:
  Resource: UANFSHA
    Enabled on: UA-HA1 (role: Started)
Ordering Constraints:
  start UAVG1_res then start UAVOL1_res
  start UAVG2_res then start UAVOL2_res
  start UAVOL1_res then start ClusterIP
  start UAVOL2_res then start ClusterIP
  start ClusterIP then start NFS-D
  start NFS-D then start nfs-cm-shared
  start NFS-D then start nfs-global-home
Colocation Constraints:
[root@UA-HA1 ~]#

 

22. You can also verify the NFS shares using the following command. (Have to execute where the resources are running currently)

[root@UA-HA1 ~]# showmount -e 192.168.2.90
Export list for 192.168.2.90:
/SAP_SOFT   192.168.2.0/255.255.255.0
/users1/home 192.168.2.0/255.255.255.0
[root@UA-HA1 ~]#

[root@UA-HA1 ~]# ifconfig eth0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
        inet 192.168.2.90  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::7efe:9003:a7:851  prefixlen 64  scopeid 0x20

 

23. Enable the STONITH.

#pcs property set stonith-enabled=true
#pcs property show stonith-enabled

 

24. Login to NFS clients and mount the shares .

# mkdir /users1/home 
# mkdir /SAP_SOFT
# mount -t nfs -o vers=4 192.168.2.90:/SAP_SOFT  /SAP_SOFT
# mount -t nfs -o vers=4 192.168.2.90:/users1/home  /users1/home

We have successfully setup high availability NFS server v4 using pacemaker cluster suite on Redhat Enterprise Linux 7.x.

If you have any trouble with resources , use the following command to clear the state. Resource might be automatically banned if its faulted more than one twice.

 [root@UA-HA1 init.d]# pcs resource clear UANFSHA

Hope this article is informative to you. Share it ! Comment it ! Be Sociable !!!

The post Configuring NFS HA using Redhat Cluster – Pacemaker on RHEL 7 appeared first on UnixArena.

RHEL 7 – Installing Redhat Cluster Software (Corosync/pacemaker) – Part 3

$
0
0

In this article, we will see that how to install Redhat cluster software (Pacemaker) on RHEL 7. If you have valid redhat subscription , you can directly configure redhat repository and install the packages. It also available in the RHEL 7 ISO image as an Add-on Package. Unlike previous redhat cluster releases , Redhat cluster 7 installation looks very simple since redhat has moved to pacemaker & corosync. Prior to proceeding with installation, I would request to go through the following articles.

 

 

Environment:

  • Operating System: Redhat Enterprise Linux 7.2
  • Repository : Local YUM Repository using RHEL 7.2 DVD ISO image.
  • Type of Cluster : Active / Passive – Two Node cluster
  • Cluster Resource : KVM guest (VirtualDomain)

 

 

YUM Repository configuration for OS , HA & Storage:

 

1. Copy the RHEL 7.2 DVD ISO image to the system or attach as DVD device.

2. Mount the ISO Image /DVD under “/repo”

[root@UA-HA ~]# df -h /repo
Filesystem      Size  Used Avail Use% Mounted on
/dev/sr1        3.8G  3.8G     0 100% /repo
[root@UA-HA ~]#

 

3.List the DVD contents.

[root@UA-HA ~]# ls -lrt /repo
total 872
-r--r--r--  1 root root  18092 Mar  6  2012 GPL
-r--r--r--  1 root root   8266 Apr  4  2014 EULA
-r--r--r--  1 root root   3211 Oct 23 09:25 RPM-GPG-KEY-redhat-release
-r--r--r--  1 root root   3375 Oct 23 09:25 RPM-GPG-KEY-redhat-beta
-r--r--r--  1 root root    114 Oct 30 10:54 media.repo
-r--r--r--  1 root root   1568 Oct 30 11:03 TRANS.TBL
dr-xr-xr-x  2 root root   4096 Oct 30 11:03 repodata
dr-xr-xr-x 24 root root   6144 Oct 30 11:03 release-notes
dr-xr-xr-x  2 root root 835584 Oct 30 11:03 Packages
dr-xr-xr-x  2 root root   2048 Oct 30 11:03 LiveOS
dr-xr-xr-x  2 root root   2048 Oct 30 11:03 isolinux
dr-xr-xr-x  3 root root   2048 Oct 30 11:03 images
dr-xr-xr-x  3 root root   2048 Oct 30 11:03 EFI
dr-xr-xr-x  4 root root   2048 Oct 30 11:03 addons
[root@UA-HA ~]#

4. Create the yum repository file with name of “ua.repo” and update with following contents. (Except “cat” command line)

[root@UA-HA ~]# cat /etc/yum.repos.d/ua.repo
[repo-update]
gpgcheck=0
enabled=1
baseurl=file:///repo
name=repo-update

[repo-ha]
gpgcheck=0
enabled=1
baseurl=file:///repo/addons/HighAvailability
name=repo-ha

[repo-storage]
gpgcheck=0
enabled=1
baseurl=file:///repo/addons/ResilientStorage
name=repo-storage
[root@UA-HA ~]#

 

5.List the configured yum repositories.

[root@UA-HA ~]# yum repolist
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
repo id                                                                         repo name                                                                      status
!repo-ha                                                                        repo-ha                                                                           30
!repo-storage                                                                   repo-storage                                                                      37
!repo-update                                                                    repo-update                                                                    4,620
repolist: 4,687
[root@UA-HA ~]#

We have successfully configured the YUM local repository using RHEL 7.2 ISO image.

 

 

Installing Cluster Packages on Nodes:

 

1.Login to the RHEL 7.2 node as root user.

 

2. Execute the following command to install the cluster packages and it’s dependencies. Corosync will install along with pacemaker.

[root@UA-HA ~]# yum install -y pacemaker pcs psmisc policycoreutils-python
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package pacemaker.x86_64 0:1.1.13-10.el7 will be installed
--> Processing Dependency: pacemaker-cli = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: pacemaker-cluster-libs = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: pacemaker-libs = 1.1.13-10.el7 for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: corosync for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcfg.so.6(COROSYNC_CFG_0.82)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcmap.so.4(COROSYNC_CMAP_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcpg.so.4(COROSYNC_CPG_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libquorum.so.5(COROSYNC_QUORUM_1.0)(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: resource-agents for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcfg.so.6()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcib.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcmap.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcorosync_common.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcpg.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmcluster.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmcommon.so.3()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libcrmservice.so.3()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: liblrmd.so.1()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpe_rules.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpe_status.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libpengine.so.4()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libquorum.so.5()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libstonithd.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
--> Processing Dependency: libtransitioner.so.2()(64bit) for package: pacemaker-1.1.13-10.el7.x86_64
---> Package pcs.x86_64 0:0.9.143-15.el7 will be installed
---> Package policycoreutils-python.x86_64 0:2.2.5-20.el7 will be installed
---> Package psmisc.x86_64 0:22.20-9.el7 will be installed
--> Running transaction check
---> Package corosync.x86_64 0:2.3.4-7.el7 will be installed
---> Package corosynclib.x86_64 0:2.3.4-7.el7 will be installed
---> Package pacemaker-cli.x86_64 0:1.1.13-10.el7 will be installed
---> Package pacemaker-cluster-libs.x86_64 0:1.1.13-10.el7 will be installed
---> Package pacemaker-libs.x86_64 0:1.1.13-10.el7 will be installed
---> Package resource-agents.x86_64 0:3.9.5-54.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================
 Package                       Arch          Version                Repository          Size
=============================================================================================
Installing:
 pacemaker                     x86_64        1.1.13-10.el7          repo-ha            462 k
 pcs                           x86_64        0.9.143-15.el7         repo-ha            4.7 M
 policycoreutils-python        x86_64        2.2.5-20.el7           repo-update        435 k
 psmisc                        x86_64        22.20-9.el7            repo-update        140 k
Installing for dependencies:
 corosync                      x86_64        2.3.4-7.el7            repo-ha            210 k
 corosynclib                   x86_64        2.3.4-7.el7            repo-ha            124 k
 pacemaker-cli                 x86_64        1.1.13-10.el7          repo-ha            253 k
 pacemaker-cluster-libs        x86_64        1.1.13-10.el7          repo-ha             92 k
 pacemaker-libs                x86_64        1.1.13-10.el7          repo-ha            519 k
 resource-agents               x86_64        3.9.5-54.el7           repo-ha            339 k

Transaction Summary
============================================================================================
Install  4 Packages (+6 Dependent packages)

Total download size: 7.3 M
Installed size: 19 M
Downloading packages:
--------------------------------------------------------------------------------------------
Total                         19 MB/s | 7.3 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : corosynclib-2.3.4-7.el7.x86_64                  1/10
  Installing : corosync-2.3.4-7.el7.x86_64                     2/10
  Installing : pacemaker-libs-1.1.13-10.el7.x86_64             3/10
  Installing : pacemaker-cli-1.1.13-10.el7.x86_64              4/10
  Installing : psmisc-22.20-9.el7.x86_64                       5/10
  Installing : resource-agents-3.9.5-54.el7.x86_64             6/10
  Installing : pacemaker-cluster-libs-1.1.13-10.el7.x86_64     7/10
  Installing : pacemaker-1.1.13-10.el7.x86_64                  8/10
  Installing : pcs-0.9.143-15.el7.x86_64                       9/10
  Installing : policycoreutils-python-2.2.5-20.el7.x86_64     10/10
  Verifying  : pcs-0.9.143-15.el7.x86_64                       1/10
  Verifying  : corosync-2.3.4-7.el7.x86_64                     2/10
  Verifying  : pacemaker-cli-1.1.13-10.el7.x86_64              3/10
  Verifying  : psmisc-22.20-9.el7.x86_64                       4/10
  Verifying  : resource-agents-3.9.5-54.el7.x86_64             5/10
  Verifying  : pacemaker-cluster-libs-1.1.13-10.el7.x86_64     6/10
  Verifying  : pacemaker-libs-1.1.13-10.el7.x86_64             7/10
  Verifying  : pacemaker-1.1.13-10.el7.x86_64                  8/10
  Verifying  : policycoreutils-python-2.2.5-20.el7.x86_64      9/10
  Verifying  : corosynclib-2.3.4-7.el7.x86_64                 10/10

Installed:
  pacemaker.x86_64 0:1.1.13-10.el7                  pcs.x86_64 0:0.9.143-15.el7        
policycoreutils-python.x86_64 0:2.2.5-20.el7        psmisc.x86_64 0:22.20-9.el7

Dependency Installed:
  corosync.x86_64 0:2.3.4-7.el7          corosynclib.x86_64 0:2.3.4-7.el7       
  pacemaker-cli.x86_64 0:1.1.13-10.el7   pacemaker-cluster-libs.x86_64 0:1.1.13-10.el7
  pacemaker-libs.x86_64 0:1.1.13-10.el7  resource-agents.x86_64 0:3.9.5-54.el7

Complete!
[root@UA-HA ~]#

 

We have successfully installed the cluster packages.

 

Note: crmsh is not available in RHEL 7 which is alternative to pcs commands.

 

In My cluster environment, I have disabled the firewall & selinux to avoid the complexity .

[root@UA-HA ~]# setenforce 0
setenforce: SELinux is disabled
[root@UA-HA ~]#
[root@UA-HA ~]# cat /etc/selinux/config |grep SELINUX |grep -v "#"
SELINUX=disabled
SELINUXTYPE=targeted
[root@UA-HA ~]#
[root@UA-HA ~]# systemctl stop firewalld.service
[root@UA-HA ~]# systemctl disable firewalld.service
[root@UA-HA ~]# iptables --flush
[root@UA-HA ~]# 

 

Hope this article is informative to you. In the next article, we will see that how to configure the cluster using pacemaker.

Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Installing Redhat Cluster Software (Corosync/pacemaker) – Part 3 appeared first on UnixArena.

RHEL 7 – Configuring Pacemaker/Corosync – Redhat Cluster – Part 4

$
0
0

In this article, we will see that how to configure two node Redhat cluster using pacemaker & corosync on REHL 7.2. Once you have installed the necessary packages, you need to enable the cluster services at the system start-up. You must start the necessary cluster services before kicking off the cluster configuration. “hacluster” user will be created automatically during the package installation with disabled password. Corosync will use this user to sync the cluster configuration,  starting and stopping the cluster on cluster nodes.

 

Environment:

  • Operating System: Redhat Enterprise Linux 7.2
  • Type of Cluster :  Two Node cluster – Failover
  • Nodes: UA-HA & UA-HA2  (Assuming that packages have been installed on both the nodes)
  • Cluster Resource : KVM guest (VirtualDomain)  –  See in Next Article.

 

Hardware configuration: 

  1. CPU – 2
  2. Memory – 4GB
  3. NFS – For shared storage

 

Redhat Cluster 7 - RHEL 7 - PCS
Redhat Cluster 7 – RHEL 7 – PCS

 

Enable & Start  the Services on both the Nodes:

 

1.Login to both the cluster nodes as root user.

2. Enable the pcsd daemon on both the nodes to start automatically across the reboot. pcsd is pacemaker configuration daemon. (Not a cluster service)

[root@UA-HA ~]# systemctl start pcsd.service
[root@UA-HA ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@UA-HA ~]# systemctl status pcsd.service
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:22:08 EST; 14s ago
 Main PID: 18411 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─18411 /bin/sh /usr/lib/pcsd/pcsd start
           ├─18415 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─18416 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Dec 27 23:22:07 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Dec 27 23:22:08 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

 

3. Set the new password for cluster user “hacluster” on both the nodes.

[root@UA-HA ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA ~]#
[root@UA-HA2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@UA-HA2 ~]#


Configure corosync & Create new cluster:

 

1. Login to any of the cluster node and authenticate “hacluster” user.

[root@UA-HA ~]# pcs cluster auth UA-HA UA-HA2
Username: hacluster
Password:
UA-HA: Authorized
UA-HA2: Authorized
[root@UA-HA ~]#

 

2.Create a new cluster using pcs command.

[root@UA-HA ~]# pcs cluster setup --name UABLR UA-HA UA-HA2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
UA-HA: Succeeded
UA-HA2: Succeeded
Synchronizing pcsd certificates on nodes UA-HA, UA-HA2...
UA-HA: Success
UA-HA2: Success

Restaring pcsd on the nodes in order to reload the certificates...
UA-HA: Success
UA-HA2: Success
[root@UA-HA ~]#

 

3. Check the cluster status .

[root@UA-HA ~]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA ~]#

You see the error because , cluster service is not started.

 

4. Start the cluster using pcs command. “–all” will start the cluster on all the configured nodes.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

 

In the back-end , “pcs cluster start” command will trigger the following command on each cluster node.

# systemctl start corosync.service
# systemctl start pacemaker.service

 

5. Check the cluster services status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:31 EST; 11s ago
  Process: 18994 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 19001 (corosync)
   CGroup: /system.slice/corosync.service
           └─19001 corosync

Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[1]: 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [TOTEM ] A new membership (192.168.203.131:1464) was formed. Members joined: 2
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] This node is within the primary component and will provide service.
Dec 27 23:34:31 UA-HA corosync[19001]:  [QUORUM] Members[2]: 2 1
Dec 27 23:34:31 UA-HA corosync[19001]:  [MAIN  ] Completed service synchronization, ready to provide service.
Dec 27 23:34:31 UA-HA systemd[1]: Started Corosync Cluster Engine.
Dec 27 23:34:31 UA-HA corosync[18994]: Starting Corosync Cluster Engine (corosync): [  OK  ]
[root@UA-HA ~]# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2015-12-27 23:34:32 EST; 15s ago
 Main PID: 19016 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─19016 /usr/sbin/pacemakerd -f
           ├─19017 /usr/libexec/pacemaker/cib
           ├─19018 /usr/libexec/pacemaker/stonithd
           ├─19019 /usr/libexec/pacemaker/lrmd
           ├─19020 /usr/libexec/pacemaker/attrd
           ├─19021 /usr/libexec/pacemaker/pengine
           └─19022 /usr/libexec/pacemaker/crmd

Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: pcmk_quorum_notification: Node UA-HA[1] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:   notice: Watching for stonith topology changes
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: Notifications disabled
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: The local CRM is operational
Dec 27 23:34:33 UA-HA crmd[19022]:   notice: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Dec 27 23:34:33 UA-HA attrd[19020]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:33 UA-HA attrd[19020]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
Dec 27 23:34:33 UA-HA stonith-ng[19018]:  warning: Node names with capitals are discouraged, consider changing 'UA-HA2' to something else
Dec 27 23:34:34 UA-HA stonith-ng[19018]:   notice: crm_update_peer_proc: Node UA-HA2[2] - state is now member (was (null))
[root@UA-HA ~]#

 

Verify Corosync configuration:

 

1. Check the corosync communication status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]#

 

In my setup, first RING is using interface “br0”.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        inet6 fe80::84ef:2eff:fee9:260a  prefixlen 64  scopeid 0x20
        ether 00:0c:29:2d:3f:ce  txqueuelen 0  (Ethernet)
        RX packets 15797  bytes 1877460 (1.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 7018  bytes 847881 (828.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@UA-HA ~]#

We can have multiple RINGS to provide the redundancy for the cluster communication. (We use to call LLT links in VCS )

 

2. Check the membership and quorum API’s.

[root@UA-HA ~]# corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#
[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

 

Verify Pacemaker Configuration:

 

1. Check the running pacemaker processes.

[root@UA-HA ~]# ps axf |grep pacemaker
19324 pts/0    S+     0:00  |       \_ grep --color=auto pacemaker
19016 ?        Ss     0:00 /usr/sbin/pacemakerd -f
19017 ?        Ss     0:00  \_ /usr/libexec/pacemaker/cib
19018 ?        Ss     0:00  \_ /usr/libexec/pacemaker/stonithd
19019 ?        Ss     0:00  \_ /usr/libexec/pacemaker/lrmd
19020 ?        Ss     0:00  \_ /usr/libexec/pacemaker/attrd
19021 ?        Ss     0:00  \_ /usr/libexec/pacemaker/pengine
19022 ?        Ss     0:00  \_ /usr/libexec/pacemaker/crmd

 

2. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:44:44 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3. You can see that corosync & pacemaker is active now and disabled across the system reboot. If you would like to start the cluster automatically across the reboot, you can enable it using systemctl command.

[root@UA-HA2 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@UA-HA2 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sun Dec 27 23:51:30 2015          Last change: Sun Dec 27 23:34:55 2015 by hacluster via crmd on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

4. When the cluster starts, it automatically records the number and details of the nodes in the cluster, as well as which stack is being used and the version of Pacemaker being used. To view the cluster configuration (Cluster Information Base – CIB) in XML format, use the following command.

[root@UA-HA2 ~]# pcs cluster cib

 

5. Verify the cluster information base using the following command.

[root@UA-HA ~]# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
[root@UA-HA ~]#

By default pacemaker enables STONITH (Shoot The Other Node In The Head ) / Fencing in an order to protect the data. Fencing is mandatory when you use the shared storage to avoid the data corruptions.

For time being , we will disable the STONITH and configure it later.

 

6. Disable the STONITH (Fencing)

[root@UA-HA ~]#pcs property set stonith-enabled=false
[root@UA-HA ~]# 
[root@UA-HA ~]#  pcs property show stonith-enabled
Cluster Properties:
 stonith-enabled: false
[root@UA-HA ~]#

 

7. Verify the cluster configuration again. Hope the errors will be disappear

[root@UA-HA ~]# crm_verify -L -V
[root@UA-HA ~]#

 

We have successfully configured two node redhat cluster  on RHEL 7.2 with new components pacemaker and corosync.  Hope this article is informative to you.

Share it ! Comment it !! Be Sociable !!!

 

The post RHEL 7 – Configuring Pacemaker/Corosync – Redhat Cluster – Part 4 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Resource Agents Overview – Part 5

$
0
0

Resource agents plays an important role in cluster management.  Resource agents are multi-threaded processes that provides the logic to manage  the resources. Pacemaker has one agent per resource type. Resource type could be a File-system , IP address , databases, virtual-domain and more. Resource agent is responsible to monitor, start , stop,validate , migrate , promote and demote  the cluster resources whenever required. Most of the resource agents are compliant  to Open Cluster Framework (OCF) .  Let’s add one IP resource to the existing cluster and then we will get in to the detailed explanation of command options. 

 

1. Login to one of the Redhat Cluster (Pacemaker/corosync) cluster node as root user.

 

2. Check the cluster status .

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 13:06:01 2015          Last change: Sun Dec 27 23:59:59 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:


PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3. Add the IP which needs to be high-available (Clustered IP).

[root@UA-HA ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=192.168.203.190 cidr_netmask=24 op monitor interval=30s
[root@UA-HA ~]#

ClusterIP – Resource Name(You can give any name)
ocf:heartbeat:IPaddr2 – Resource agent Name.

 

Resource Standard:

The first field (ocf in this case) is the standard to which the resource script conforms and where to find it.
To obtain a list of the available resource standards , use the following command.

[root@UA-HA ~]# pcs resource standards
ocf   - Open cluster Framework 
lsb   - Linux standard base (legacy init scripts)
service - Based on Linux "service" command. 
systemd  - systemd based service Management
stonith  - Fencing Resource standard. 
[root@UA-HA ~]#

 

Resource Provides:

The second field (heartbeat in this case) is standard-specific; for OCF resources, it tells the cluster which OCF namespace the resource script is in. To obtain a list of the available OCF resource providers, use the following command.

[root@UA-HA ~]# pcs resource providers
heartbeat
openstack
pacemaker
[root@UA-HA ~]#

 

What are the pre-built resource agents available in RHEL 7.2  ? 

The third field (IPaddr2 in this case) is the name of the resource script. To see all the resource agents available for a specific OCF provider (heartbeat) , use the following command.

[root@UA-HA ~]# pcs resource agents ocf:heartbeat
CTDB
Delay
Dummy
Filesystem
IPaddr
IPaddr2
IPsrcaddr
LVM
MailTo
Route
SendArp
Squid
VirtualDomain
Xinetd
apache
clvm
conntrackd
db2
dhcpd
docker
ethmonitor
exportfs
galera
iSCSILogicalUnit
iSCSITarget
iface-vlan
mysql
named
nfsnotify
nfsserver
nginx
oracle
oralsnr
pgsql
postfix
rabbitmq-cluster
redis
rsyncd
slapd
symlink
tomcat
[root@UA-HA ~]# pcs resource agents ocf:heartbeat |wc -l
41
[root@UA-HA ~]#

 

For Openstack , you have following resources agents.

[root@UA-HA ~]# pcs resource agents ocf:openstack
NovaCompute
NovaEvacuate
[root@UA-HA ~]#

 

Here is the list resource agents to manager the pacemaker components.

[root@UA-HA ~]# pcs resource agents ocf:pacemaker
ClusterMon
Dummy
HealthCPU
HealthSMART
Stateful
SysInfo
SystemHealth
controld
ping
pingd
remote
[root@UA-HA ~]#

 

4.Verify the resource status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 13:07:33 2015          Last change: Mon Dec 28 13:07:30 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

 

As per the cluster status , IP resource is online on node “UA-HA” . Let’s verify from OS command line.

[root@UA-HA ~]# ip a |grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 192.168.203.134/24 brd 192.168.203.255 scope global dynamic br0
    inet 192.168.203.190/24 brd 192.168.203.255 scope global secondary br0
    inet6 fe80::84ef:2eff:fee9:260a/64 scope link
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
[root@UA-HA ~]#
[root@UA-HA ~]# ping 192.168.203.190
PING 192.168.203.190 (192.168.203.190) 56(84) bytes of data.
64 bytes from 192.168.203.190: icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from 192.168.203.190: icmp_seq=2 ttl=64 time=0.090 ms
64 bytes from 192.168.203.190: icmp_seq=3 ttl=64 time=0.121 ms
64 bytes from 192.168.203.190: icmp_seq=4 ttl=64 time=0.094 ms
^C
--- 192.168.203.190 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 0.084/0.097/0.121/0.015 ms
[root@UA-HA ~]#

 

We can see that IP “192.168.203.190/24” is up & running. This IP will  automatically move from one node to another node if the system fails.

The post RHEL 7 – Pacemaker – Cluster Resource Agents Overview – Part 5 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Resources/Group Management – Part 6

$
0
0

In Pacemaker/Corosync  cluster (RHEL 7 HA),  resources management and resource group management are important tasks . Depends on the cluster HA services, you might need to configure N-number of resources. In most of the cases , you might need to start set of resources sequentially, and stop in the reverse order.  To simplify this configuration, Pacemaker supports the concept of groups (Resource groups). For an example, to provide the web-services in HA model, you need resources like , File system(To store website data) , IP (Clustered IP to access website) and Apache (To provide the web-services) . To start the Apache service , you need a filesystem which stores the website data. So the resources must start in the following order ,

  1. IP
  2. File-system
  3. Apache service

 

Let’s see that how to configure the Highly available Apache service (website) in Redhat cluster (Pacemaker/Corosync).  In the previous article, we have already created the IP resource.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 18:24:10 2015          Last change: Mon Dec 28 18:09:30 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]# pcs resource show ClusterIP
 Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.203.190 cidr_netmask=24
  Operations: start interval=0s timeout=20s (ClusterIP-start-interval-0s)
              stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
              monitor interval=30s (ClusterIP-monitor-interval-30s)
[root@UA-HA ~]#

 

Create the File-system and  Apache resources quickly:

 

Filesystem : 

  • Shared LUN – /dev/sdc
  • Volume Group – webvg
  • Volume – webvol1
  • Filesystem Type – ext4

 

Quick Setup for Filesystem resource: 

[root@UA-HA2 ~]# vgcreate webvg /dev/sdc
[root@UA-HA2 ~]# lvcreate -L 90M -n /dev/webvg/webvol1
[root@UA-HA2 ~]# mkfs.ext4 /dev/webvg/webvol1

 

Apache:

  • httpd

Quick Setup:

[root@UA-HA www]# yum install -y httpd

 

Pre-prerequisites for LVM :

(Perform the following changes on both the cluster nodes)

1.Make sure that “use_lvmetad” parameter is set to “0”. This is mandatory when you use “Pacemaker”.

[root@UA-HA ~]# grep use_lvmetad /etc/lvm/lvm.conf |grep -v "#"
    use_lvmetad = 0
[root@UA-HA ~]#

 

2.To prevent the automatic volume group activation, update the volume_list parameter with local VG’s which needs to be activated automatically.

[root@UA-HA ~]# grep volume_list /etc/lvm/lvm.conf |grep -v "#"
        volume_list = [ "nfsvg", "rhel" ]
[root@UA-HA ~]# vgs
  VG    #PV #LV #SN Attr   VSize  VFree
  nfsvg   2   1   0 wz--n-  1.94g 184.00m
  rhel    1   2   0 wz--n- 19.51g      0
  webvg   1   1   0 wz--n- 92.00m      0
[root@UA-HA ~]#

In My case, “webvg” will be managed through cluster.

 

3. Mount the volume in “/var/www”  and create the following directories and files.

[root@UA-HA2 ~]# mount /dev/webvg/webvol1 /var/www
[root@UA-HA2 ~]# cd /var/www
[root@UA-HA2 www]# mkdir errror html cgi-bin
total 3
drwxr-xr-x 2 root root 1024 Dec 28 20:26 cgi-bin
drwxr-xr-x 2 root root 1024 Dec 28 20:26 errror
drwxr-xr-x 2 root root 1024 Dec 28 20:27 html
[root@UA-HA2 www]# cd html/
[root@UA-HA2 html]# vi index.html
Hello, Welcome to UnixArena 

[root@UA-HA2 html]#

 

3.Rebuild the “initramfs” boot image to guarantee that the boot image will not try to activate a volume group controlled by the cluster. Update the initramfs device using the following command.

[root@UA-HA ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
[root@UA-HA ~]#

 

4. Reboot the nodes.

 

 

Create the LVM Cluster resources (vg & lv ), File-system cluster Resources:

 

1.Create the cluster volume group resource.

[root@UA-HA ~]# pcs resource create vgres LVM volgrpname=webvg exclusive=true
[root@UA-HA ~]# pcs resource show vgres
 Resource: vgres (class=ocf provider=heartbeat type=LVM)
  Attributes: volgrpname=webvg exclusive=true
  Operations: start interval=0s timeout=30 (vgres-start-interval-0s)
              stop interval=0s timeout=30 (vgres-stop-interval-0s)
              monitor interval=10 timeout=30 (vgres-monitor-interval-10)
[root@UA-HA ~]#

vgresResource Name (Any Unique Name)
webvgVolume Group

 

2. Create the cluster mount resource.

[root@UA-HA ~]# pcs resource create webvolfs Filesystem  device="/dev/webvg/webvol1" directory="/var/www" fstype="ext4"
[root@UA-HA ~]# pcs resource show webvolfs
 Resource: webvolfs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/webvg/webvol1 directory=/var/www fstype=ext4
  Meta Attrs: 
  Operations: start interval=0s timeout=60 (webvolfs-start-interval-0s)
              stop interval=0s timeout=60 (webvolfs-stop-interval-0s)
              monitor interval=20 timeout=40 (webvolfs-monitor-interval-20)
[root@UA-HA ~]#

 

3. Before adding the resource, you must update the local /etc/httpd/conf/httpd.conf with following contents. This entries required for pacemaker to get the web-server status .

Update apache conf
Update apache conf

 

4. Check the apache server status . (httpd.service). Make sure that httpd.service is stopped & disabled on both the cluster nodes. This service will be managed by cluster.

[root@UA-HA ~]# systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:httpd(8)
           man:apachectl(8)

Dec 27 13:55:52 UA-HA systemd[1]: Starting The Apache HTTP Server...
Dec 27 13:55:55 UA-HA httpd[2002]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.203.134. Set the...is message
Dec 27 13:55:55 UA-HA systemd[1]: Started The Apache HTTP Server.
Dec 27 15:16:02 UA-HA httpd[11786]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.203.134. Set th...is message
Dec 27 15:16:02 UA-HA systemd[1]: Reloaded The Apache HTTP Server.
Dec 28 18:06:57 UA-HA systemd[1]: Started The Apache HTTP Server.
Dec 28 20:30:56 UA-HA systemd[1]: Stopping The Apache HTTP Server...
Dec 28 20:30:57 UA-HA systemd[1]: Stopped The Apache HTTP Server.
Hint: Some lines were ellipsized, use -l to show in full.
[root@UA-HA ~]#

 

3. Create the Apache cluster resource.

[root@UA-HA ~]# pcs resource create webres apache configfile="/etc/httpd/conf/httpd.conf" statusurl="http://127.0.0.1/server-status"
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:11:51 2015          Last change: Mon Dec 28 20:11:44 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 vgres  (ocf::heartbeat:LVM):   (target-role:Stopped) Stopped
 webvolfs       (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped
 ClusterIP      (ocf::heartbeat:IPaddr2):       Started UA-HA2
 webres (ocf::heartbeat:apache):        Stopped

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

In normal cases, resource group will be created (by specifying –group in the end of command line) when you add the first cluster resource to make the dependency tree. To understand the cluster resources and resource group management concept , I am creating the resource group at the end.

 

If you see any resource was started , just stop it to avoid the errors.

[root@UA-HA ~]# pcs resource disable vgres webvolfs webres stop ClusterIP      
[root@UA-HA ~]# pcs resource
 vgres  (ocf::heartbeat:LVM):                    Stopped
 webvolfs       (ocf::heartbeat:Filesystem):     Stopped
 ClusterIP      (ocf::heartbeat:IPaddr2):        Stopped
 webres (ocf::heartbeat:apache):                 Stopped
[root@UA-HA ~]#

 

4. Create the resource group to form the resource dependencies to stop & start in resources in sequence.

[root@UA-HA ~]# pcs resource group add WEBRG1 ClusterIP vgres webvolfs webres

 

As per the above command , here is the resource start up sequence

  1. ClusterIP – Website URL
  2. vgres – Volume Group
  3. webvolfs – Mount Resource
  4. webres – httpd Resource

 

Stop sequence is just reverse to the start.

  1. webres – httpd Resource
  2. webvolfs – Mount Resource
  3. vgres – Volume Group
  4. ClusterIP – Website URL

 

5. Check the resources status. You should be able to see that all the resources are bundled as one resource group with  “WEBRG1” .

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):            Stopped
     webvolfs   (ocf::heartbeat:Filesystem):     Stopped
     webres     (ocf::heartbeat:apache):         Stopped
[root@UA-HA ~]#

 

6. Enable the disabled resources in following sequence.

[root@UA-HA ~]# pcs resource enable ClusterIP
[root@UA-HA ~]# pcs resource enable vgres
[root@UA-HA ~]# pcs resource enable webvolfs
[root@UA-HA ~]# pcs resource enable webres

 

7. Verify the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:54:43 2015          Last change: Mon Dec 28 20:51:30 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

8. Let’s move the resources from UA-HA2 to UA-HA. In this case, we no need to move each resources manually.We just need to move the Resource group since we have bundled the required resource in to that.

[root@UA-HA ~]# pcs resource move WEBRG1 UA-HA
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 20:58:55 2015          Last change: Mon Dec 28 20:58:41 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

You should be able to see the webpage like following .

Website Portal
Website Portal

 

9. How to stop the pacemaker resource group ? Just disable the resource group.

[root@UA-HA2 ~]# pcs resource disable WEBRG1
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 21:12:18 2015          Last change: Mon Dec 28 21:12:14 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       (target-role:Stopped) Stopped
     vgres      (ocf::heartbeat:LVM):   (target-role:Stopped) Stopped
     webvolfs   (ocf::heartbeat:Filesystem):    (target-role:Stopped) Stopped
     webres     (ocf::heartbeat:apache):        (target-role:Stopped) Stopped

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

10. How to start the resource group ? Use enable option for the RG.

[root@UA-HA2 ~]# pcs resource enable WEBRG1
[root@UA-HA2 ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 21:14:04 2015          Last change: Mon Dec 28 21:14:01 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA2 ~]#

 

Note:
Redhat cluster (Pacemaker/corosync) have many parameters like resource stickiness and failure counts.These attributes will play a role that where to start the resources.

 

To clear the errors , use the following command

# pcs resource cleanup 

 

To clear the resource fail counts , use the following command.

 [root@UA-HA2 ~]# pcs resource clear ClusterIP
[root@UA-HA2 ~]# pcs resource clear vgres
[root@UA-HA2 ~]# pcs resource clear webvolfs
[root@UA-HA2 ~]# pcs resource clear webres
[root@UA-HA2 ~]#

 

Hope this article is informative to you.

 

Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Cluster Resources/Group Management – Part 6 appeared first on UnixArena.


RHEL 7 – Pacemaker – Configuring HA KVM guest – Part 7

$
0
0

If you have followed the KVM article series in UnixArena , you might have read the article which talks about the KVM guest live migration. KVM supports the Guest Live migration (similar to VMware vMotion) but to provide high availability , you need need a cluster setup . (Like VMware HA).  In this article ,we will configure the KVM guest as cluster resource with live migration support. If you move the KVM guest resource manually , cluster will perform the live migration and if any hardware failure or hypervisor failure happens on KVM host, guest will be started on available cluster node (with minimal downtime). I will be using the existing KVM  and redhat cluster setup  to demonstrate this.

 

  • KVM Hyper-visor – RHEL 7.2
  • Redhat cluster Nodes – UA-HA & UA-HA2
  • Shared storage – NFS   (As a alternative , you can also use GFS2 )
  • KVM guest – UAKVM2

 

HA KVM guest using Pacemaker
HA KVM guest using Pacemaker

 

1. Login to one of the cluster node and halt the KVM guest.

[root@UA-HA ~]# virsh shutdown UAKVM2
[root@UA-HA ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     UAKVM2                         shut off

[root@UA-HA ~]#

 

2.Copy the Guest domain configuration file (XML) to NFS path.

[root@UA-HA qemu_config]# cd /etc/libvirt/qemu/
[root@UA-HA qemu]# ls -lrt
total 8
drwx------. 3 root root   40 Dec 14 09:13 networks
drwxr-xr-x. 2 root root    6 Dec 16 16:16 autostart
-rw-------  1 root root 3676 Dec 23 02:52 UAKVM2.xml
[root@UA-HA qemu]#
[root@UA-HA qemu]# cp UAKVM2.xml /kvmpool/qemu_config
[root@UA-HA qemu]# ls -lrt /kvmpool/qemu_config
total 4
-rw------- 1 root root 3676 Dec 23 08:14 UAKVM2.xml
[root@UA-HA qemu]#

 

3. Un-define the KVM virtual guest. (To configure as cluster resource)

[root@UA-HA qemu]# virsh undefine UAKVM2
Domain UAKVM2 has been undefined

[root@UA-HA qemu]# virsh list --all
 Id    Name                           State
----------------------------------------------------

[root@UA-HA qemu]#

 

4. Check the pacemaker cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:44:59 2015          Last change: Mon Dec 28 21:16:56 2015 by root via crm_resource on UA-HA2
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

5. To manage the KVM guest, you need to use resource agent called “VirtualDomain”. Let’s create a new virtual domain using the UAKVM2.xml file where we have stored in /kvmpool/qemu_config.

[root@UA-HA ~]# pcs resource create UAKVM2_res VirtualDomain hypervisor="qemu:///system" config="/kvmpool/qemu_config/UAKVM2.xml" migration_transport=ssh op start timeout="120s" op stop timeout="120s" op monitor  timeout="30" interval="10"  meta allow-migrate="true" priority="100" op migrate_from interval="0" timeout="120s" op migrate_to interval="0" timeout="120" --group UAKVM2
[root@UA-HA ~]#

 

6. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:51:36 2015          Last change: Mon Dec 28 22:51:36 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

7. KVM guest “UAKVM2” must be created and started automatically. Check the running VM using following command.

[root@UA-HA ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 2     UAKVM2                         running

[root@UA-HA ~]#

 

8. Pacemaker also support the live KVM guest migration. To migrate the KVM guest to other KVM host on fly, use the following command.

[root@UA-HA ~]# pcs resource move UAKVM2 UA-HA2
[root@UA-HA ~]#

In the above command,

UAKVM2 refers the Resource group name & UA-HA2 refers the cluster node name

 

9. Check the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Mon Dec 28 22:54:51 2015          Last change: Mon Dec 28 22:54:38 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

10. List the VM using virsh command. You can see that VM is moved from UA-HA to UA-HA2.

[root@UA-HA ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------

[root@UA-HA ~]# ssh UA-HA2 virsh list
 Id    Name                           State
----------------------------------------------------
 2     UAKVM2                         running

[root@UA-HA ~]#

During this migration , you will not even notice a single packet drop. That’s really cool.

 

Hope this article is informative to you . Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Configuring HA KVM guest – Part 7 appeared first on UnixArena.

RHEL 7 – Pacemaker – Cluster Node Management – Part 8

$
0
0

This article will demonstrates about the Pacemaker/Corosync cluster membership, node management and other  cluster operational tasks. Periodically , you might need to take the cluster node offline to perform the maintenance activities like OS package update/upgrade , hardware replacement/upgrade etc. In  such cases ,you need to put the cluster node in to standby mode to keep the cluster operational on other node to avoid the voting issue (In-case of two node cluster).  The cluster stand-by option is persistent across the cluster node reboot. So we no need to bother about the automatic resource start-up until we make the node as un-standby.

In the last section ,  we will see about the cluster maintenance mode which is completely different from the node standby & un-standby operations.  Cluster Maintenance is a preferred method if you are doing the online changes on the cluster nodes.

Pre-configured resources are vgres (LVM – volume group), webvolfs (Logical volume) , ClusterIP (HA IP address for website) , webres (Apache) and UAKVM2_res (HA KVM Guest ).

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Cluster nodes are UA-HA & UA-HA2.

[root@UA-HA ~]# pcs cluster status
Cluster Status:
 Last updated: Sat Oct 17 11:58:23 2015         Last change: Sat Oct 17 11:57:48 2015 by root via crm_attribute on UA-HA
 Stack: corosync
 Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
 2 nodes and 5 resources configured
 Online: [ UA-HA UA-HA2 ]

PCSD Status:
  UA-HA: Online
  UA-HA2: Online
[root@UA-HA ~]#

 

Move a Cluster node in to the Standby Mode:

1. Login to one of the cluster node with root user and check node status.

[root@UA-HA ~]# pcs status nodes
Pacemaker Nodes:
 Online: UA-HA UA-HA2
 Standby:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Offline:
[root@UA-HA ~]#

 

2. Verify the cluster status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:00:35 2015          Last change: Sat Oct 17 11:57:48 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

3.You can also use the crm_mon to monitor the cluster status in real time.

[root@UA-HA ~]# crm_mon
Last updated: Sat Oct 17 12:05:50 2015          Last change: Sat Oct 17 12:04:28 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

 

To terminate the crm_mon, press control+c.

[root@UA-HA ~]# crm_mon
Connection to the CIB terminated
[root@UA-HA ~]#

 

4. To move the specific node in to standby mode , use the following command.

[root@UA-HA ~]# pcs cluster standby UA-HA2
[root@UA-HA ~]#

 

Check the cluster status again,

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:09:35 2015          Last change: Sat Oct 17 12:09:23 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Node UA-HA2: standby
Online: [ UA-HA ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

You can see that Resource Group “UAKVM2” is automatically moved from UA-HA2 to UA-HA. You can perform the maintenance activity on UA-HA2 without worrying about the cluster membership and automatic resource start-up.

 

5. Check the cluster membership status. (Quorum status).

[root@UA-HA ~]# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

OR

[root@UA-HA ~]# corosync-quorumtool
Quorum information
------------------
Date:             Sat Oct 17 12:15:54 2015
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          2296
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         2          1 UA-HA2
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

Even though node UA-HA2 is standby mode, it still provides the vote to the cluster. If you have halted the node “UA-HA2” for maintenance activity, quorum status will change like below.

[root@UA-HA ~]# corosync-quorumtool
Quorum information
------------------
Date:             Sat Oct 17 12:16:25 2015
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          1
Ring ID:          2300
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           1
Flags:            2Node Quorate WaitForAll

Membership information
----------------------
    Nodeid      Votes Name
         1          1 UA-HA (local)
[root@UA-HA ~]#

 

Clear the Standby Mode:

1. Once the maintenance is completed for UA-HA2 , just make it as un-standby to make the cluster node available for operation.

[root@UA-HA ~]# pcs cluster unstandby UA-HA2
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:29:21 2015          Last change: Sat Oct 17 12:29:19 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

2.You could move the desired resource group to UA-HA2 .

[root@UA-HA ~]# pcs resource move UAKVM2 UA-HA2
[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 12:32:05 2015          Last change: Sat Oct 17 12:29:19 2015 by root via crm_attribute on UA-HA
Stack: corosync
Current DC: UA-HA (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

We have successfully put the node “UA-HA2” in to the maintenance mode and revert it back.

 

How to stop/start the cluster services on specific node ?

1.Check the cluster status.

[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 16:53:02 2015          Last change: Sat Oct 17 16:52:21 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

2.Let’s plan to stop the cluster services on UA-HA. As per the cluster status, group “UAKVM2” is running on UA-HA.

 

3.Stop the cluster services on UA-HA and let’s see what happens to the group. From UA-HA node, execute the following command.

[root@UA-HA log]# pcs cluster stop
Stopping Cluster (pacemaker)... Stopping Cluster (corosync)...
[root@UA-HA log]# pcs status
Error: cluster is not currently running on this node
[root@UA-HA log]#

 

Since pcsd daemon is stopped, you can’t check the cluster status from UA-HA. Let’s check from UA-HA2 node.

[root@UA-HA log]# ssh UA-HA2 pcs status
Cluster name: UABLR
Last updated: Sun Jan 10 12:13:52 2016          Last change: Sun Jan 10 12:05:47 2016 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA2 ]
OFFLINE: [ UA-HA ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

Group “UAKVM2”  has been automatically moved to UA-HA2. What happens , if you start the cluster services on UA-HA ?

[root@UA-HA log]# pcs cluster start
Starting Cluster...
[root@UA-HA log]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 17:03:45 2015          Last change: Sun Jan 10 12:05:47 2016 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

Group UAKVM2 is automatically move back to UA-HA.

 

If you do not want to move the resource group automatically ,

1. “BAN” the resource group  in which you would like to stop the cluster services.

[root@UA-HA log]# pcs resource ban UAKVM2 UA-HA
Warning: Creating location constraint cli-ban-UAKVM2-on-UA-HA with a score of -INFINITY for resource UAKVM2 on node UA-HA.
This will prevent UAKVM2 from running on UA-HA until the constraint is removed. This will be the case even if UA-HA is the last node in the cluster.

 

2. Resource group will be automatically moved to other nodes in the cluster.

[root@UA-HA log]# pcs status
Cluster name: UABLR
Last updated: Sat Oct 17 17:18:25 2015          Last change: Sat Oct 17 17:17:48 2015 by root via crm_resource on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA log]#

 

3. Cluster creates a constraints to prevent the group starting from the specific node.

[root@UA-HA log]# pcs constraint
Location Constraints:
  Resource: UAKVM2
    Disabled on: UA-HA (score:-INFINITY) (role: Started)
Ordering Constraints:
Colocation Constraints:

 

4. stop the cluster service. (If you want to stop the cluster service on the specific node).

5. Start the cluster service.

6. Move the resource group back to the system on desired time.

 

Cluster Maintenance Mode: (Online)

If you would like to perform the software upgrades and configuration changes which impacts the cluster resources, you need to make the cluster in to maintenance mode . So that all the resources will be tagged as un-managed by pacemaker. Which means , Pacemaker monitoring will be turned off and no action will be taken by cluster until you remove the maintenance mode. This is one of the useful feature to upgrade the cluster components and perform the other resource changes.

1. To move the cluster in to maintenance mode, use the following command.

[root@UA-HA ~]# pcs property set maintenance-mode=true

 

2. Check the Cluster Property

[root@UA-HA ~]# pcs property list
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: UABLR
dc-version: 1.1.13-10.el7-44eb2dd
have-watchdog: false
last-lrm-refresh: 1452507397
maintenance-mode: true
stonith-enabled: false

 

3. Check the cluster status. Resources are set to unmanaged Flag.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sun Oct 18 12:19:33 2015 Last change: Sun Oct 18 12:19:27 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

Resource Group: WEBRG1
vgres (ocf::heartbeat:LVM): Started UA-HA2 (unmanaged)
webvolfs (ocf::heartbeat:Filesystem): Started UA-HA2 (unmanaged)
ClusterIP (ocf::heartbeat:IPaddr2): Started UA-HA2 (unmanaged)
webres (ocf::heartbeat:apache): Started UA-HA2 (unmanaged)
Resource Group: UAKVM2
UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA (unmanaged)

PCSD Status:
UA-HA: Online
UA-HA2: Online

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@UA-HA ~]#

 

4. Resources are continuous to run even though you have stopped the cluster services.

[root@UA-HA ~]# pcs cluster stop --all
UA-HA: Stopping Cluster (pacemaker)...
UA-HA2: Stopping Cluster (pacemaker)...
UA-HA2: Stopping Cluster (corosync)...
UA-HA: Stopping Cluster (corosync)...
[root@UA-HA ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 55    UAKVM2                         running

[root@UA-HA ~]#

Perform the maintenance activity which can be done without rebooting the system.

 

5. Start the cluster services.

[root@UA-HA ~]# pcs cluster start --all
UA-HA2: Starting Cluster...
UA-HA: Starting Cluster...
[root@UA-HA ~]#

 

6. Resource should still show as unmanaged & online.

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2 (unmanaged)
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2 (unmanaged)
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2 (unmanaged)
     webres     (ocf::heartbeat:apache):        Started UA-HA2 (unmanaged)
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA (unmanaged)

 

7. Clear the Maintenance mode.

[root@UA-HA ~]# pcs property set maintenance-mode=flase

OR

[root@UA-HA ~]# pcs property unset maintenance-mode

 

8. Verify the resource status.

[root@UA-HA ~]# pcs status
Cluster name: UABLR
Last updated: Sun Oct 18 12:41:59 2015          Last change: Sun Oct 18 12:41:51 2015 by root via cibadmin on UA-HA
Stack: corosync
Current DC: UA-HA2 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 5 resources configured

Online: [ UA-HA UA-HA2 ]

Full list of resources:

 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA2
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA2
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA2
     webres     (ocf::heartbeat:apache):        Started UA-HA2
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA

PCSD Status:
  UA-HA: Online
  UA-HA2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@UA-HA ~]#

 

Hope this article is informative to you. Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Cluster Node Management – Part 8 appeared first on UnixArena.

RHEL 7 – Pacemaker – Configure Redundant Corosync Links on Fly– Part 10

$
0
0

Corosync cluster engine provides the reliable inter-cluster communications between the cluster nodes. It syncs the cluster configuration across the cluster nodes all the time. It also maintains the cluster membership and notifies when quorum is achieved or lost. It provides the messaging layer inside the cluster to manage the system and resource availability. In Veritas cluster , this functionality has been provided by LLT + GAB (Low latency transport + Global Atomic Broadcast) . Unlike veritas cluster, Corosync uses the existing network interface to communicate with cluster nodes.

 

Why do we need  redundant corosync Links ?

By default ,we configure the network bonding by aggregating couple of physical network interfaces for primary node IP.  Corosync will use this interface as heartbeat link in default configurations. If there is an issue with network and lost the network connectivity between two nodes , cluster might need to face the split brain situation. To avoid split brain , we are configuring additional network links. This network link should be configured with different network switch or we can use the direct network cable between two nodes.

Note: For tutorial simplicity , we will use unicast (Not Multicast) for corosync.  Unicast method should be fine for two node clusters.

 

Configuring the additional corosync links is an online activity and can be done without impacting the services.

 

Let’s explore the existing configuration:

1. View the corosync configuration using pcs command.

[root@UA-HA ~]# pcs cluster corosync
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
}

nodelist {
    node {
        ring0_addr: UA-HA
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

[root@UA-HA ~]#

 

2. Corosync uses two UDP ports mcastport (for mcast receives) and mcastport – 1 (for mcast sends).

  • mcast receives: 5405
  • mcast sends: 5404
[root@UA-HA ~]# netstat -plantu | grep 54 |grep corosync
udp        0      0 192.168.203.134:5405    0.0.0.0:*                           34363/corosync
[root@UA-HA ~]#

 

3. Corosync configuration file is located in /etc/corosync.

[root@UA-HA ~]# cat /etc/corosync/corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
}

nodelist {
    node {
        ring0_addr: UA-HA
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}
[root@UA-HA ~]#

 

4. Verify current ring Status using corosync-cfgtool.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
[root@UA-HA ~]# ssh UA-HA2 corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 192.168.203.131
        status  = ring 0 active with no faults
[root@UA-HA ~]#

 

As we can see that only one ring has been configured for corosync and it uses the following interfaces from each node.

[root@UA-HA ~]# ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.134  netmask 255.255.255.0  broadcast 192.168.203.255
        

[root@UA-HA ~]# ssh UA-HA2 ifconfig br0
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.203.131  netmask 255.255.255.0  broadcast 192.168.203.255
        
[root@UA-HA ~]#

 

Configure a new ring :

 

5. To add additional redundancy for corosync links, we will use the following interface on both nodes.

[root@UA-HA ~]# ifconfig eno33554984
eno33554984: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.3  netmask 255.255.255.0  broadcast 172.16.0.255
        
[root@UA-HA ~]# ssh UA-HA2 ifconfig eno33554984
eno33554984: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.0.2  netmask 255.255.255.0  broadcast 172.16.0.255
       
[root@UA-HA ~]#

Dedicated Private address for Corosync Links:
172.16.0.3 – UA-HA-HB2
172.16.0.2 – UA-HA2-HB2

 

6. Before making changes in corosync configuration, we need to move the cluster in to maintenance mode.

[root@UA-HA ~]# pcs property set maintenance-mode=true
[root@UA-HA ~]# pcs property show maintenance-mode
Cluster Properties:
 maintenance-mode: true
[root@UA-HA ~]#

 

This will eventually puts the resources in unmanaged state.

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA (unmanaged)
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA (unmanaged)
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA (unmanaged)
     webres     (ocf::heartbeat:apache):        Started UA-HA (unmanaged)
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2 (unmanaged)
[root@UA-HA ~]#

 

7. Update the /etc/hosts with following entries on both the nodes.

[root@UA-HA corosync]# cat /etc/hosts |grep HB2
172.16.0.3     UA-HA-HB2
172.16.0.2     UA-HA2-HB2
[root@UA-HA corosync]#

 

8. Update the corosync.conf with rrp_mode & ring1_addr.

[root@UA-HA corosync]# cat corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
    rrp_mode: active
}

nodelist {
    node {
        ring0_addr: UA-HA
        ring1_addr: UA-HA-HB2
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        ring1_addr: UA-HA2-HB2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}
[root@UA-HA corosync]#

 

Here is the difference between previous configuration file vs New one.

[root@UA-HA corosync]# sdiff -s corosync.conf corosync.conf_back
   rrp_mode: active                                           <
        ring1_addr: UA-HA-HB2                                 <
        ring1_addr: UA-HA2-HB2                                <
[root@UA-HA corosync]#

 

9. Restart the corosync services on both the nodes.

[root@UA-HA ~]# systemctl restart corosync
[root@UA-HA ~]# ssh UA-HA2 systemctl restart corosync

 

10. Check the corosync service status.

[root@UA-HA ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2015-10-19 02:38:16 EDT; 16s ago
  Process: 36462 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 36470 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 36477 (corosync)
   CGroup: /system.slice/corosync.service
           └─36477 corosync

Oct 19 02:38:15 UA-HA corosync[36477]:  [QUORUM] Members[2]: 2 1
Oct 19 02:38:15 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 02:38:16 UA-HA systemd[1]: Started Corosync Cluster Engine.
Oct 19 02:38:16 UA-HA corosync[36470]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Oct 19 02:38:24 UA-HA corosync[36477]:  [TOTEM ] A new membership (192.168.203.134:3244) was formed. Members left: 2
Oct 19 02:38:24 UA-HA corosync[36477]:  [QUORUM] Members[1]: 1
Oct 19 02:38:24 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 02:38:25 UA-HA corosync[36477]:  [TOTEM ] A new membership (192.168.203.131:3248) was formed. Members joined: 2
Oct 19 02:38:26 UA-HA corosync[36477]:  [QUORUM] Members[2]: 2 1
Oct 19 02:38:26 UA-HA corosync[36477]:  [MAIN  ] Completed service synchronization, ready to provide service.
[root@UA-HA ~]#

 

11. Verify the corosync configuration using pcs command.

[root@UA-HA ~]# pcs cluster corosync
totem {
    version: 2
    secauth: off
    cluster_name: UABLR
    transport: udpu
   rrp_mode: active
}

nodelist {
    node {
        ring0_addr: UA-HA
        ring1_addr: UA-HA-HB2
        nodeid: 1
    }

    node {
        ring0_addr: UA-HA2
        ring1_addr: UA-HA2-HB2
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

[root@UA-HA ~]#

 

12.Verify the ring status.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]# ssh UA-HA2 corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 192.168.203.131
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.2
        status  = ring 1 active with no faults
[root@UA-HA ~]#

 

You could also check the ring status using following command.

[root@UA-HA ~]# corosync-cmapctl |grep member
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.203.134) r(1) ip(172.16.0.3)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.203.131) r(1) ip(172.16.0.2)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@UA-HA ~]#

We have successfully configured redundant rings  for corosync .

 

13. Clear the cluster maintenance mode.

[root@UA-HA ~]# pcs property unset maintenance-mode

or 

[root@UA-HA ~]#  pcs property set maintenance-mode=false

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Let’s break it !!

You could easily test the rrp_mode by pulling out the network cable from one of the configured interface. I have just used “ifconfig br0 down” command to simulate this test on UA-HA2 node. Assuming that application/DB is using different interface.

[root@UA-HA ~]# ping UA-HA2
PING UA-HA2 (192.168.203.131) 56(84) bytes of data.
^C
--- UA-HA2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1002ms

[root@UA-HA ~]#

 

Check the ring status. We can see that ring 0 has been marked as faulty.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = Marking ringid 0 interface 192.168.203.134 FAULTY
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]#

 

You could see that cluster is running perfectly without any issue.

[root@UA-HA ~]# pcs resource
 Resource Group: WEBRG1
     vgres      (ocf::heartbeat:LVM):   Started UA-HA
     webvolfs   (ocf::heartbeat:Filesystem):    Started UA-HA
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started UA-HA
     webres     (ocf::heartbeat:apache):        Started UA-HA
 Resource Group: UAKVM2
     UAKVM2_res (ocf::heartbeat:VirtualDomain): Started UA-HA2
[root@UA-HA ~]#

 

Bring up the br0 interface using “ifconfig br0 up”. Ring 0 is back to online.

[root@UA-HA ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.203.134
        status  = ring 0 active with no faults
RING ID 1
        id      = 172.16.0.3
        status  = ring 1 active with no faults
[root@UA-HA ~]#

Hope this article informative to you. Share it ! Comment it !! Be Sociable !!!

The post RHEL 7 – Pacemaker – Configure Redundant Corosync Links on Fly– Part 10 appeared first on UnixArena.

RHEL 7 – Accessing the Pacemaker WEB UI (GUI) – Part 11

$
0
0

Pacemaker offers web based user interface portal to manage the cluster. It also provides an interface to manage multiple clusters in single web UI. We can’t really say that WEB UI has all the options to manage the cluster. I would say that command line is much easier and simple when you compare to GUI. However , you could give a try for pacemaker web UI.  It  uses the port 2224  and you can access the web UI portal using “https://nodename:2224” .

Web UI is limited to perform the following tasks

  • Create the new cluster
  • Add existing cluster to the GUI
  • Manage the cluster nodes (stop , start, standby)
  • Configure the fence devices
  • Configure the cluster resources
  • Resource attributes (order, location, collocation, meta attributes )
  • Set the cluster properties.
  • Create a roles

I don’t see any option to switch over the resources from one node to another node .  Also there is no way to verify & configure the corosync rings.

 

Let’s access the web UI portal of pacemaker.

1. Doesn’t  require any additional setup to access the pacemaker web UI from cluster nodes. By default, pcs packages will be installed as a part of cluster package installation.

 

2.pcsd.service is responsible for  web UI.

[root@UA-HA ~]# systemctl status pcsd
● pcsd.service - PCS GUI and remote configuration interface
   Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2015-10-19 14:46:06 EDT; 2s ago
 Main PID: 55297 (pcsd)
   CGroup: /system.slice/pcsd.service
           ├─55297 /bin/sh /usr/lib/pcsd/pcsd start
           ├─55301 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           ├─55302 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
           └─55315 python2 /usr/lib/pcsd/systemd-notify-fix.py

Oct 19 14:46:01 UA-HA systemd[1]: Starting PCS GUI and remote configuration interface...
Oct 19 14:46:06 UA-HA systemd[1]: Started PCS GUI and remote configuration interface.
[root@UA-HA ~]#

 

3. pcsd configuration daemon uses the account called “hacluster” . we have setup the password during the initial cluster setup.

 

4.Let’s launch the pacemaker Web UI. You could use any one of the node’s  IP address to access it.

Pacemaker Corosync Web UI
Pacemaker Corosync WEb UI

 

5. Login with “hacluster” user credentials .

 

6. By default, there won’t be any cluster added in to the portal. Since we have a configured cluster, let’s add to this web UI.  Click “+Add Existing” link.

Pacemaker Web UI - Add Cluster
Pacemaker Web UI – Add Cluster

 

7. Enter one of the cluster node IP address and click “Add Existing”.  This process will automatic pull the cluster information to Web UI.

Add the pacemaker cluster to Web UI
Add the pacemaker cluster to Web UI

 

Same way you can add N number of clusters  to the single Web UI.  So that you can manage all the clusters from one place.

 

8. Select the cluster which you would like to manage using Web UI.

Select the cluster
Select the cluster

 

 

9.By default , it will take you to the “Nodes” tab.

Pacemaker Corosync Node status
Pacemaker Corosync Node status

 

Here you could see the following options.

  • Stop/start/restart the cluster services on specific node
  • Move the node in to standby mode.
  • Configure Fencing.

 

10. Have a look at the resource management tab.

Pacemaker Resource Management tab
Pacemaker Resource Management tab

 

 

11. Next tab is exclusively to configure & manage the fencing.

 

12. ACLS  tab provides an option to create the rule with custom rules. (Providing read only access to set of users / group)

 

13.  In “cluster properties” tab, you can find the following options.

Cluster properties
Cluster properties

 

14. The last tab is will take to you  (screen “step 8” )  cluster list.

 

I personally  felt that pacemaker web UI is limited to perform specific work.  The pacemaker (pcs) command line looks simple and powerful .

Hope this article is informative to you. Share it ! Comment it ! Be Sociable !!!

The post RHEL 7 – Accessing the Pacemaker WEB UI (GUI) – Part 11 appeared first on UnixArena.

RHEL 7 – How to configure the Fencing on Pacemaker ?

$
0
0

Fencing (STONITH) is an important mechanism in cluster to avoid the data corruption on shared storage. It also helps to bring the cluster into the known state when there is a split brain occurs between the nodes. Cluster nodes talks to each other over communication channels, which are typically standard network connections, such as Ethernet. Each resources and nodes have “state” (Ex: started , stopped) in the cluster and nodes report every changes that happens on resources. This reporting works well until communication breaks between the nodes. Fencing  will come to play when nodes can’t communicate with each other. Majority of nodes will form the cluster based on quorum votes and rest of the nodes will be rebooted or halted based on fencing actions what we have denied.

 

There are two type of fencing available in pacemaker.

  • Resource Level Fencing
  • Node Level Fencing

Using the resource level fencing, the cluster can make sure that a node cannot access same resources on both the nodes. The node level fencing makes sure that a node does not run any resources at all. This is usually done in a very simple, yet brutal way: the node is simply reset using a power switch. This may ultimately be necessary because the node may not be responsive at all.  In Pacamaker/corosync cluster, we will call the fencing method as “STONITH”  (Shoot The Other Node In The Head).

 

For more information , please visit clusterlabs.org . Here we will see the node level fencing.

 

Have a look at the cluster setup.

[root@Node1-LAB ~]# pcs status
Cluster name: GFSCLUS
Last updated: Wed Jan 20 12:43:36 2016
Last change: Wed Jan 20 09:57:06 2016 via cibadmin on Node1
Stack: corosync
Current DC: Node1 (1) - partition with quorum
Version: 1.1.10-29.el7-368c726
2 Nodes configured
2 Resources configured


Online: [ Node1 Node2 ]

PCSD Status:
  Node1: Online
  Node2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@Node1-LAB ~]#

 

In this article ,we will see that how to configure stonith/fencing  “fence_xvm”  for KVM cluster nodes. The purpose of this setup is to demonstrate the STONITH/FENCING.

 

Environment:  (Demo Purpose only)

  • Node 1 & Node 2  – Pacemaker/corosync cluster
  • UNIXKB-CP  – KVM host which hosts Node1 & Node2

 

Configure KVM host to use fence_xvm:

1.Login to the KVM host.

2.List the running virtual Machines .

[root@UNIXKB-CP ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 6     Node1                          running
 7     Node2                          running

[root@UNIXKB-CP ~]#

 

3.Install the required fencing packages on KVM host (Non-Cluster node)

[root@UNIXKB-CP ~]# yum install fence-virt fence-virtd fence-virtd-libvirt fence-virtd-multicast fence-virtd-serial
Loaded plugins: langpacks, product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Package fence-virt-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-libvirt-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-multicast-0.3.0-16.el7.x86_64 already installed and latest version
Package fence-virtd-serial-0.3.0-16.el7.x86_64 already installed and latest version
Nothing to do
[root@UNIXKB-CP ~]#

 

4. Create the new directory to store the fence key. Create the random key to use for fencing.

[root@UNIXKB-CP ~]# mkdir -p /etc/cluster
[root@UNIXKB-CP ~]# cd /etc/cluster/
[root@UNIXKB-CP cluster]# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=4k count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000506736 s, 8.1 MB/s
[root@UNIXKB-CP cluster]#

 

5. Copy the fence keys to cluster nodes. (Node1 & Node2)

[root@UNIXKB-CP cluster]# scp -r /etc/cluster/fence_xvm.key root@Node1:/etc/cluster/fence_xvm.key
root@node1's password:
fence_xvm.key                                                                                                                      100% 4096     4.0KB/s   00:00
[root@UNIXKB-CP cluster]# scp -r /etc/cluster/fence_xvm.key root@Node2:/etc/cluster/fence_xvm.key
root@node2's password:
fence_xvm.key                                                                                                                      100% 4096     4.0KB/s   00:00
[root@UNIXKB-CP cluster]#

Note: You must create a “/etc/cluster” directory on the cluster nodes in an order to copy the xvm keys.

 

6.Use “fence_virtd -c” command to create “/etc/fence_virt.conf” file.

[root@UNIXKB-CP ~]# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:

Available backends:
    libvirt 0.1
Available listeners:
    multicast 1.2

Listener modules are responsible for accepting requests
from fencing clients.

Listener module [multicast]:

The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
multicast.

The multicast address is the address that a client will use to
send fencing requests to fence_virtd.

Multicast IP Address [225.0.0.12]:

Using ipv4 as family.

Multicast IP Port [1229]:

Setting a preferred interface causes fence_virtd to listen only
on that interface.  Normally, it listens on all interfaces.
In environments where the virtual machines are using the host
machine as a gateway, this *must* be set (typically to virbr0).
Set to 'none' for no interface.

Interface [virbr0]: br0:1

The key file is the shared key information which is used to
authenticate fencing requests.  The contents of this file must
be distributed to each physical host and virtual machine within
a cluster.

Key File [/etc/cluster/fence_xvm.key]:

Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.

Backend module [libvirt]:

Configuration complete.

=== Begin Configuration ===
backends {
        libvirt {
                uri = "qemu:///system";
        }

}

listeners {
        multicast {
                port = "1229";
                family = "ipv4";
                interface = "br0:1";
                address = "225.0.0.12";
                key_file = "/etc/cluster/fence_xvm.key";
        }

}

fence_virtd {
        module_path = "/usr/lib64/fence-virt";
        backend = "libvirt";
        listener = "multicast";
}

=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y
[root@UNIXKB-CP ~]#

Make sure that you are proving the correct interface as the bridge. In My setup , I am using br0:1 virtual interface to communicate with KVM guests.

 

7. Start the fence_virtd service.

[root@UNIXKB-CP ~]# systemctl enable fence_virtd.service
[root@UNIXKB-CP ~]# systemctl start fence_virtd.service
[root@UNIXKB-CP ~]# systemctl status fence_virtd.service
fence_virtd.service - Fence-Virt system host daemon
   Loaded: loaded (/usr/lib/systemd/system/fence_virtd.service; enabled)
   Active: active (running) since Wed 2016-01-20 23:36:14 IST; 1s ago
  Process: 3530 ExecStart=/usr/sbin/fence_virtd $FENCE_VIRTD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 3531 (fence_virtd)
   CGroup: /system.slice/fence_virtd.service
           └─3531 /usr/sbin/fence_virtd -w

Jan 20 23:36:14 UNIXKB-CP systemd[1]: Starting Fence-Virt system host daemon...
Jan 20 23:36:14 UNIXKB-CP systemd[1]: Started Fence-Virt system host daemon.
Jan 20 23:36:14 UNIXKB-CP fence_virtd[3531]: fence_virtd starting.  Listener: libvirt  Backend: multicast
[root@UNIXKB-CP ~]#

 

Configure the Fencing on Cluster Nodes:

1.Login to one of the cluster node.

2.Make sure that both the nodes have “fence_virt” package.

[root@Node1-LAB ~]# rpm -qa fence-virt
fence-virt-0.3.0-16.el7.x86_64
[root@Node1-LAB ~]#

 

3. The following commands much be scuesscced in an order to configure the fencing in cluster.

[root@Node1-LAB ~]# fence_xvm -o list
Node1                6daac670-c494-4e02-8d90-96cf900f2be9 on
Node2                17707dcb-7bcc-4b36-9498-a5963d86dc2f on
[root@Node1-LAB ~]#

 

4.Cluster nodes entry must be present in the /etc/hosts.

[root@Node1-LAB ~]# cat /etc/hosts |grep Node
192.168.2.10    Node1-LAB  Node1
192.168.2.11    Node2-LAB  Node2
[root@Node1-LAB ~]#

 

5.Configure fence_xvm fence agent on pacemaker cluster.

[root@Node1-LAB ~]# pcs stonith create xvmfence  fence_xvm key_file=/etc/cluster/fence_xvm.key
[root@Node1-LAB ~]# 
[root@Node1-LAB ~]# pcs stonith
 xvmfence       (stonith:fence_xvm):    Started
[root@Node1-LAB ~]#
[root@Node1-LAB ~]# pcs stonith --full
 Resource: xvmfence (class=stonith type=fence_xvm)
  Attributes: key_file=/etc/cluster/fence_xvm.key
  Operations: monitor interval=60s (xvmfence-monitor-interval-60s)
[root@Node1-LAB ~]#

We have successfully configure the fencing on RHEL 7 – Pacemaker/Corosync cluster. (Cluster has been configured between two KVM guests).

 

Validate the STONITH:

 

How should I test my “stonith” configuration ? Here is the small demonstration.

1. Login to the one of the cluster node.

2. Try to fence on of the node.

[root@Node1-LAB ~]# pcs stonith fence Node2
Node: Node2 fenced
[root@Node1-LAB ~]#

 

This will eventually reboot Node2 . Reboot happens based on cluster property.

[root@Node1-LAB ~]# pcs property --all |grep stonith-action
 stonith-action: reboot
[root@Node1-LAB ~]#

 

Stonith also can be ON/OFF using pcs property command.

[root@Node1-LAB ~]# pcs property --all |grep stonith-enabled
 stonith-enabled: true
[root@Node1-LAB ~]#

 

Hope this article is informative to you.

The post RHEL 7 – How to configure the Fencing on Pacemaker ? appeared first on UnixArena.

Viewing all 21 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>