Linux Pacemaker + Corosync 高可用环境搭建-谢先斌的博客

Pacemaker(心脏起搏器) 是一个群集资源管理器，使用 Corosync 管理心跳。Pacemaker 是为 Heartbeat 项目而开发的 Cluster Resource Manager(CRM) 项目的延续。

Pacemaker 特点

主机和应用程序级别的故障检测和恢复
几乎支持任何冗余配置
同时支持多种集群配置模式
- Active/Active
- Active/Passive
- N+1
- N+M等
支持应用启动/关机顺序
支持多种模式的应用程序（如主/从）
可以测试任何故障或群集的群集状态

安装

环境说明

172.20.0.21 vm1
172.20.0.22 vm2
172.20.0.23 vm3

所有机器配置：

host 配置 hostname 解析
关闭防火墙
关闭 Selinux

安装

以下操作在 vm1~3 上执行：

安装 rpm：

yum install -y pcs fence-agents-all

启动服务

systemctl enable pcsd
systemctl start pcsd

每个节点设置 hacluster 密码

echo xiexianbin.cn | passwd --stdin hacluster

在 vm1 上配置 cluster auth 验证

pcs cluster auth vm1 vm2 vm3 -u hacluster -p xiexianbin.cn [--force]

配置名称为 mycluster 的 pcs 集群

$ pcs cluster setup --start --name mycluster vm1 vm2 vm3  # --force 参数强制生成集群
Destroying cluster on nodes: vm1, vm2, vm3...
vm1: Stopping Cluster (pacemaker)...
vm2: Stopping Cluster (pacemaker)...
vm3: Stopping Cluster (pacemaker)...
vm1: Successfully destroyed cluster
vm2: Successfully destroyed cluster
vm3: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vm1', 'vm2', 'vm3'
vm1: successful distribution of the file 'pacemaker_remote authkey'
vm2: successful distribution of the file 'pacemaker_remote authkey'
vm3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vm1: Succeeded
vm2: Succeeded
vm3: Succeeded

Starting cluster on nodes: vm1, vm2, vm3...
vm1: Starting Cluster (corosync)...
vm2: Starting Cluster (corosync)...
vm3: Starting Cluster (corosync)...
vm2: Starting Cluster (pacemaker)...
vm3: Starting Cluster (pacemaker)...
vm1: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes vm1, vm2, vm3...
vm2: Success
vm3: Success
vm1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm2: Success
vm3: Success
vm1: Success

启动（上一步已经启动）

pcs cluster start --all

配置 pcs 集群随 pcs 服务自动启动

$ pcs cluster enable --all
vm1: Cluster Enabled
vm2: Cluster Enabled
vm3: Cluster Enabled

使用

help

$ pcs --help

Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.

Options:
    -h, --help         Display usage and exit.
    -f file            Perform actions on file instead of active CIB.
    --debug            Print all network traffic and external commands run.
    --version          Print pcs version information. List pcs capabilities if
                       --full is specified.
    --request-timeout  Timeout for each outgoing request to another node in
                       seconds. Default is 60s.
    --force            Override checks and errors, the exact behavior depends on
                       the command. WARNING: Using the --force option is
                       strongly discouraged unless you know what you are doing.

Commands:
    cluster     Configure cluster options and nodes.
    resource    Manage cluster resources.
    stonith     Manage fence devices.
    constraint  Manage resource constraints.
    property    Manage pacemaker properties.
    acl         Manage pacemaker access control lists.
    qdevice     Manage quorum device provider on the local host.
    quorum      Manage cluster quorum settings.
    booth       Manage booth (cluster ticket manager).
    status      View cluster status.
    config      View and manage cluster configuration.
    pcsd        Manage pcs daemon.
    node        Manage cluster nodes.
    alert       Manage pacemaker alerts.
    client      Manage pcsd client configuration.

查看 pcs 状态

$ pcs status
$ pcs status cluster
$ pcs status corosync

# 集群状态
$ pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: vm1 (version 1.1.19-8.el7-c3c624ea3d) - partition WITHOUT quorum
 Last updated: Sun Aug  4  1 10:21:40 2019
 Last change: Sun Aug  4  1 10:19:14 2019 by hacluster via crmd on vm1
 3 nodes configured
 0 resource instances configured

PCSD Status:
  vm1: Online
  vm2: Online
  vm3: Online

查看 corosync 状态

crm_mon -1

pcs 集群配置

# 配置检测
$ crm_verify -L -V

$ pcs property --help
# Run 'man pengine' and 'man crmd' to get a description of the properties.

# 所有节点执行：没有Fencing设备时，禁用stonith（很重要）
# WARNING: no stonith devices and stonith-enabled is not false
$ pcs property set stonith-enabled=false

$ pcs property set pe-warn-series-max=1000 \
pe-input-series-max=1000 \
pe-error-series-max=1000 \
cluster-recheck-interval=3min

$ pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 cluster-recheck-interval: 3min
 dc-version: 1.1.23-1.el7_9.1-9acf116022
 have-watchdog: false
 pe-error-series-max: 1000
 pe-input-series-max: 1000
 pe-warn-series-max: 1000
 stonith-enabled: false
 symmetric-cluster: false

# 设置资源默认粘性（防止资源回切）
pcs resource defaults resource-stickiness=100

# 设置资源超时时间
pcs resource op defaults timeout=10s
pcs resource op defaults

配置 vip

$ pcs resource create vip ocf:heartbeat:IPaddr2 \
ip=172.20.0.20 cidr_netmask=24 nic=ens33 \
op monitor interval=30s

节点 vm1 上有 vip

[root@vm1 ~]# ip a show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:38:88:62 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.21/24 brd 172.20.0.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 172.20.0.20/24 brd 172.20.0.255 scope global secondary ens33
       valid_lft forever preferred_lft forever

删除 vip

ip addr del 172.20.0.20/24 dev ens33

可以看到 vip 会重新配置到 vm1 或漂移到其他节点。

查看配置

[root@vm1 ~]# pcs resource
 vip	(ocf::heartbeat:IPaddr2):	Stopped
[root@vm1 ~]# pcs resource show
 vip	(ocf::heartbeat:IPaddr2):	Stopped
[root@vm1 ~]# pcs resource show vip
 Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=24 ip=172.20.0.20 nic=ens33
  Operations: monitor interval=30s (vip-monitor-interval-30s)
              start interval=0s timeout=20s (vip-start-interval-0s)
              stop interval=0s timeout=20s (vip-stop-interval-0s)

其他配置命令

pcs config

pcs resource update vip ip=172.20.0.24
pcs resource delete vip
pcs resource cleanup

# 节点的添加和移除
pcs cluster node add <new server>
pcs cluster node remove [node]

# 控制节点的状态
pcs cluster standby <server>
pcs cluster standby --all
pcs cluster unstandby <server>
pcs cluster unstandby --all

# 配置检查
crm_verify -L -V

# 查看成员信息
corosync-cmapctl | grep members

corosync-cfgtool -s

FAQ

节点故障处理

# 故障节点
systemctl stop pcsd pacemaker corosync

systemctl start pcsd

# 正常节点执行
pcs cluster sync

重启集群

pcs cluster stop --all
pcs cluster sync
pcs cluster start --all

配置vip不生效

/var/log/message 错误日志

vm3 crmd[9263]:  notice: Result of probe operation for vip on vm3: 7 (not running)

集群节点没有达到 3 个，修复后解决问题。

partition WITHOUT quorum

不超过总结的一半，集群不工作。

Node vm1: UNCLEAN (offline)

pcs property set stonith-enabled=false

Linux Pacemaker + Corosync 高可用环境搭建

Pacemaker 特点

安装

环境说明

安装

使用

help

查看 pcs 状态

查看 corosync 状态

pcs 集群配置

配置 vip

查看配置

相关文件

FAQ

节点故障处理

重启集群

配置vip不生效

partition WITHOUT quorum

Node vm1: UNCLEAN (offline)

参考

Linux Pacemaker + Corosync 高可用环境搭建

Pacemaker 特点

安装

环境说明

安装

使用

help

查看 pcs 状态

查看 corosync 状态

pcs 集群配置

配置 vip

查看配置

相关文件

FAQ

节点故障处理

重启集群

配置vip不生效

partition WITHOUT quorum

Node vm1: UNCLEAN (offline)

参考

Cookie Notice!