Linux Pacemaker + Corosync 高可用环境搭建

发布时间: 更新时间: 总字数:1271 阅读时间:3m 作者: 分享 复制网址

Pacemaker(心脏起搏器) 是一个群集资源管理器,使用 Corosync 管理心跳。Pacemaker 是为 Heartbeat 项目而开发的 Cluster Resource Manager(CRM) 项目的延续。

Pacemaker 特点

  • 主机和应用程序级别的故障检测和恢复
  • 几乎支持任何冗余配置
  • 同时支持多种集群配置模式
    • Active/Active
    • Active/Passive
    • N+1
    • N+M等
  • 支持应用启动/关机顺序
  • 支持多种模式的应用程序(如主/从)
  • 可以测试任何故障或群集的群集状态

安装

环境说明

172.20.0.21 vm1
172.20.0.22 vm2
172.20.0.23 vm3

所有机器配置:

  • host 配置 hostname 解析
  • 关闭防火墙
  • 关闭 Selinux

安装

以下操作在 vm1~3 上执行:

  • 安装 rpm:
yum install -y pcs fence-agents-all
  • 启动服务
systemctl enable pcsd
systemctl start pcsd
  • 每个节点设置 hacluster 密码
echo xiexianbin.cn | passwd --stdin hacluster
  • 在 vm1 上配置 cluster auth 验证
pcs cluster auth vm1 vm2 vm3 -u hacluster -p xiexianbin.cn [--force]
  • 配置名称为 myclusterpcs 集群
$ pcs cluster setup --start --name mycluster vm1 vm2 vm3  # --force 参数强制生成集群
Destroying cluster on nodes: vm1, vm2, vm3...
vm1: Stopping Cluster (pacemaker)...
vm2: Stopping Cluster (pacemaker)...
vm3: Stopping Cluster (pacemaker)...
vm1: Successfully destroyed cluster
vm2: Successfully destroyed cluster
vm3: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'vm1', 'vm2', 'vm3'
vm1: successful distribution of the file 'pacemaker_remote authkey'
vm2: successful distribution of the file 'pacemaker_remote authkey'
vm3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
vm1: Succeeded
vm2: Succeeded
vm3: Succeeded

Starting cluster on nodes: vm1, vm2, vm3...
vm1: Starting Cluster (corosync)...
vm2: Starting Cluster (corosync)...
vm3: Starting Cluster (corosync)...
vm2: Starting Cluster (pacemaker)...
vm3: Starting Cluster (pacemaker)...
vm1: Starting Cluster (pacemaker)...

Synchronizing pcsd certificates on nodes vm1, vm2, vm3...
vm2: Success
vm3: Success
vm1: Success
Restarting pcsd on the nodes in order to reload the certificates...
vm2: Success
vm3: Success
vm1: Success
  • 启动(上一步已经启动)
pcs cluster start --all
  • 配置 pcs 集群随 pcs 服务自动启动
$ pcs cluster enable --all
vm1: Cluster Enabled
vm2: Cluster Enabled
vm3: Cluster Enabled

使用

help

$ pcs --help

Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.

Options:
    -h, --help         Display usage and exit.
    -f file            Perform actions on file instead of active CIB.
    --debug            Print all network traffic and external commands run.
    --version          Print pcs version information. List pcs capabilities if
                       --full is specified.
    --request-timeout  Timeout for each outgoing request to another node in
                       seconds. Default is 60s.
    --force            Override checks and errors, the exact behavior depends on
                       the command. WARNING: Using the --force option is
                       strongly discouraged unless you know what you are doing.

Commands:
    cluster     Configure cluster options and nodes.
    resource    Manage cluster resources.
    stonith     Manage fence devices.
    constraint  Manage resource constraints.
    property    Manage pacemaker properties.
    acl         Manage pacemaker access control lists.
    qdevice     Manage quorum device provider on the local host.
    quorum      Manage cluster quorum settings.
    booth       Manage booth (cluster ticket manager).
    status      View cluster status.
    config      View and manage cluster configuration.
    pcsd        Manage pcs daemon.
    node        Manage cluster nodes.
    alert       Manage pacemaker alerts.
    client      Manage pcsd client configuration.

查看 pcs 状态

$ pcs status
$ pcs status cluster
$ pcs status corosync

# 集群状态
$ pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: vm1 (version 1.1.19-8.el7-c3c624ea3d) - partition WITHOUT quorum
 Last updated: Sun Aug  4  1 10:21:40 2019
 Last change: Sun Aug  4  1 10:19:14 2019 by hacluster via crmd on vm1
 3 nodes configured
 0 resource instances configured

PCSD Status:
  vm1: Online
  vm2: Online
  vm3: Online

查看 corosync 状态

crm_mon -1

pcs 集群配置

# 配置检测
$ crm_verify -L -V

$ pcs property --help
# Run 'man pengine' and 'man crmd' to get a description of the properties.

# 所有节点执行:没有Fencing设备时,禁用stonith(很重要)
# WARNING: no stonith devices and stonith-enabled is not false
$ pcs property set stonith-enabled=false

$ pcs property set pe-warn-series-max=1000 \
pe-input-series-max=1000 \
pe-error-series-max=1000 \
cluster-recheck-interval=3min

$ pcs property list
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: mycluster
 cluster-recheck-interval: 3min
 dc-version: 1.1.23-1.el7_9.1-9acf116022
 have-watchdog: false
 pe-error-series-max: 1000
 pe-input-series-max: 1000
 pe-warn-series-max: 1000
 stonith-enabled: false
 symmetric-cluster: false

# 设置资源默认粘性(防止资源回切)
pcs resource defaults resource-stickiness=100

# 设置资源超时时间
pcs resource op defaults timeout=10s
pcs resource op defaults

配置 vip

$ pcs resource create vip ocf:heartbeat:IPaddr2 \
ip=172.20.0.20 cidr_netmask=24 nic=ens33 \
op monitor interval=30s
  • 节点 vm1 上有 vip
[root@vm1 ~]# ip a show ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:50:56:38:88:62 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.21/24 brd 172.20.0.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 172.20.0.20/24 brd 172.20.0.255 scope global secondary ens33
       valid_lft forever preferred_lft forever
  • 删除 vip
ip addr del 172.20.0.20/24 dev ens33

可以看到 vip 会重新配置到 vm1 或漂移到其他节点。

查看配置

[root@vm1 ~]# pcs resource
 vip	(ocf::heartbeat:IPaddr2):	Stopped
[root@vm1 ~]# pcs resource show
 vip	(ocf::heartbeat:IPaddr2):	Stopped
[root@vm1 ~]# pcs resource show vip
 Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: cidr_netmask=24 ip=172.20.0.20 nic=ens33
  Operations: monitor interval=30s (vip-monitor-interval-30s)
              start interval=0s timeout=20s (vip-start-interval-0s)
              stop interval=0s timeout=20s (vip-stop-interval-0s)
  • 其他配置命令
pcs config

pcs resource update vip ip=172.20.0.24
pcs resource delete vip
pcs resource cleanup

# 节点的添加和移除
pcs cluster node add <new server>
pcs cluster node remove [node]

# 控制节点的状态
pcs cluster standby <server>
pcs cluster standby --all
pcs cluster unstandby <server>
pcs cluster unstandby --all

# 配置检查
crm_verify -L -V

# 查看成员信息
corosync-cmapctl | grep members

corosync-cfgtool -s

相关文件

  • /etc/corosync/corosync.conf,使用 pcs cluster sync 同步到其他节点
  • /var/log/pacemaker.log
  • /var/log/pcsd/pcsd.log

FAQ

节点故障处理

# 故障节点
systemctl stop pcsd pacemaker corosync

systemctl start pcsd

# 正常节点执行
pcs cluster sync

重启集群

pcs cluster stop --all
pcs cluster sync
pcs cluster start --all

配置vip不生效

  • /var/log/message 错误日志
vm3 crmd[9263]:  notice: Result of probe operation for vip on vm3: 7 (not running)

集群节点没有达到 3 个,修复后解决问题。

partition WITHOUT quorum

不超过总结的一半,集群不工作。

Node vm1: UNCLEAN (offline)

pcs property set stonith-enabled=false

参考

  1. https://clusterlabs.org/
Home Archives Categories Tags Statistics
本文总阅读量 次 本站总访问量 次 本站总访客数