使用 openhpc 安装 warewulf+slurm 环境
warewulf 组成
- VNFS 镜像制作
- iPXE :计算节点通过 pxe 自动安装
- 文件同步:添加用户自动将相关配置文件同步到计算节点
- 节点维护:维护 HPC 集群的所有计算节点
环境
- 控制节点 CentOS 7.7
- 计算节点,没有系统
安装步骤
安装 ohpc 源
yum install -y http://build.openhpc.community/OpenHPC:/1.3/CentOS_7/x86_64/ohpc-release-1.3-1.el7.x86_64.rpm
安装 docs-ohpc 包
[sms]# yum -y install docs-ohpc
[sms]# cp /opt/ohpc/pub/doc/recipes/centos7/input.local input.local
编辑模板文件
+sms_ip="${sms_ip:-172.20.0.80}"
+sms_eth_internal="${sms_eth_internal:-ens33}"
+internal_netmask="${internal_netmask:-255.255.255.0}"
+eth_provision="${eth_provision:-ens33}"
+enable_ganglia="${enable_ganglia:-1}"
+enable_nagios="${enable_nagios:-1}"
+nagios_web_password="${nagios_web_password:-admin}"
-slurm_node_config="${slurm_node_config:-c[1-4] Sockets=2 CoresPerSocket=12 ThreadsPerCore=2}"
+slurm_node_config="${slurm_node_config:-c[1-1] Sockets=2 CoresPerSocket=1 ThreadsPerCore=1}"
+num_computes="${num_computes:-1}"
+c_ip[0]=172.20.0.3
+c_mac[0]=00:50:56:39:E9:82
slurm_node_config 配置说明:
- Sockets=
lscpu | grep "^Socket(s):" | awk '{print $2}'
- CoresPerSocket=
lscpu | grep "^Core(s) per socket:" | awk -F ":" '{print $2}'
- ThreadsPerCore=
lscpu | grep "^Thread(s) per core:" | awk -F ":" '{print $2}'
复制模板安装脚本
[sms]# cp -p /opt/ohpc/pub/doc/recipes/centos7/x86_64/warewulf/slurm/recipe.sh .
[sms]# export OHPC_INPUT_LOCAL=./input.local
[sms]# ./recipe.sh
注意:
- 如果网络好的话,大概需要 1h 后会安装完成
- 因采用 VMware 虚拟化,可能会让输入 ipmi 密码,直接输入不用管了
- PXE 安装 c1 系统,建议在安装完成后从网络启动 c1 既可以自动采用 PXE 安装
- 安装完成后,可以通过 web 访问如下:
- http://172.20.0.80/ganglia/
- http://172.20.0.80/nagios/ 用户名/密码:nagiosadmin / admin
- http://172.20.0.80/WW/ipxe/cfg/00:50:56:39:e9:82 (ww ipxe 地址,访问 /WW/ 等报 Forbidden 属正常现象)
测试
su - test
手动安装过程
# 创建BOS镜像
$ export CHROOT=/opt/ohpc/admin/images/centos7.7
$ wwmkchroot centos-7 $CHROOT
# 安装 openhpc 基础包
$ yum install -y --installroot=$CHROOT ohpc-base-compute
# 更新 resolv.conf
$ cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf
# Initialize warewulf database and ssh_keys
[sms]# wwinit database
[sms]# wwinit ssh_keys
# Warewulf 支持从管理节点导入任意文件来分发到计算节点的功能
[sms]# wwsh file import /etc/passwd
[sms]# wwsh file import /etc/group
[sms]# wwsh file import /etc/shadow
# 组装引导镜像
# (Optional) Include drivers from kernel updates; needed if enabling additional kernel modules on computes
[sms]# export WW_CONF=/etc/warewulf/bootstrap.conf
[sms]# echo "drivers += updates/kernel/" >> $WW_CONF
# (Optional) Include overlayfs drivers; needed by Singularity
[sms]# echo "drivers += overlay" >> $WW_CONF
# Build bootstrap image
[sms]# wwbootstrap `uname -r`
# 组装虚拟节点文件系统(VNFS)映像
wwvnfs --chroot $CHROOT
# 注册计算节点
# Set provisioning interface as the default networking device
[sms]# echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$
[sms]# wwsh -y file import /tmp/network.$$ --name network
[sms]# wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
# Add nodes to Warewulf data store
[sms]# for ((i=0; i<$num_computes; i++)) ; do
wwsh -y node new ${c_name[i]} --ipaddr=${c_ip[i]} --hwaddr=${c_mac[i]} -D ${eth_provision}
done
# Additional step required if desiring to use predictable network interface
# naming schemes (e.g. en4s0f0). Skip if using eth# style names.
[sms]# export kargs="${kargs} net.ifnames=1,biosdevname=1"
[sms]# wwsh provision set --postnetdown=1 "${compute_regex}"
# Define provisioning image for hosts
[sms]# wwsh -y provision set "${compute_regex}" --vnfs=centos7.7 --bootstrap=`uname -r` \
--files=dynamic_hosts,passwd,group,shadow,network
# Restart dhcp / update PXE
[sms]# systemctl restart dhcpd
[sms]# wwsh pxe update
# 计算节点磁盘配置 Grub2
# Add GRUB2 bootloader and re-assemble VNFS image
[sms]# yum -y --installroot=$CHROOT install grub2
[sms]# wwvnfs --chroot $CHROOT
# Select (and customize) appropriate parted layout example
[sms]# cp /etc/warewulf/filesystem/examples/gpt_example.cmds /etc/warewulf/filesystem/gpt.cmds
[sms]# wwsh provision set --filesystem=gpt "${compute_regex}"
[sms]# wwsh provision set --bootloader=sda "${compute_regex}"
# 计算节点磁盘配置 uefi
# Add GRUB2 bootloader and re-assemble VNFS image
[sms]# yum -y --installroot=$CHROOT install grub2-efi grub2-efi-modules
[sms]# wwvnfs --chroot $CHROOT
[sms]# cp /etc/warewulf/filesystem/examples/efi_example.cmds /etc/warewulf/filesystem/efi.cmds
[sms]# wwsh provision set --filesystem=efi "${compute_regex}"
[sms]# wwsh provision set --bootloader=sda "${compute_regex}"
# 配置warewulf使用节点的本地存储作为引导设备
# Configure local boot (after successful provisioning)
[sms]# wwsh provision set --bootlocal=normal "${compute_regex}"
# 开发工具
# Install autotools meta-package
[sms]# yum -y install ohpc-autotools
[sms]# yum -y install EasyBuild-ohpc
[sms]# yum -y install hwloc-ohpc
[sms]# yum -y install spack-ohpc
[sms]# yum -y install valgrind-ohpc
# 编译器
yum -y install gnu8-compilers-ohpc
# MPI
yum -y install openmpi3-gnu8-ohpc mpich-gnu8-ohpc
性能工具
# Install perf-tools meta-package
[sms]# yum -y install ohpc-gnu8-perf-tools
[sms]# useradd -m test
[sms]# wwsh file resync passwd shadow group
[sms]# pdsh -w $compute_prefix[1-4] /warewulf/bin/wwgetfiles
provisioning services rely on
$ wwsh node list
NAME GROUPS IPADDR HWADDR
================================================================================
c1 UNDEF 172.20.0.3 00:50:56:39:e9:82
$ wwsh file help
USAGE:
file <command> [options] [targets]
SUMMARY:
The file command is used for manipulating file objects. It allows you to
import, export, create, and modify files within the Warewulf data store.
File objects may be used to supply files to nodes at provision time,
dynamically create files or scripts based on Warewulf data and more.
COMMANDS:
import Import a file into a file object
export Export file object(s)
edit Edit the file in the data store directly
new Create a new file in the data store
set Set file attributes/metadata
show Show the contents of a file
list List a summary of imported file(s)
print Print all file attributes
(re)sync Sync the data of a file object with its source(s)
delete Remove a node configuration from the data store
help Show usage information
OPTIONS:
-l, --lookup Identify files by specified property (default: "name")
-p, --program What external program should be used (edit/show)
-D, --path Set destination (i.e., output) path for this file
-o, --origin Set origin (i.e., input) path for this file
-m, --mode Set permission attribute for this file
-u, --uid Set the UID for this file
-g, --gid Set the GID for this file
-n, --name Set the reference name for this file (not path!)
--interpreter Set the interpreter name to parse this file
NOTE: Use "UNDEF" to erase the current contents of a given field.
EXAMPLES:
Warewulf> file import /path/to/file/to/import --name=hosts-file
Warewulf> file import /path/to/file/to/import/with/given-name
Warewulf> file edit given-name
Warewulf> file set given-name --origin=UNDEF --mode=0700
Warewulf> file set hosts-file --path=/etc/hosts --mode=0644 --uid=0
Warewulf> file list
Warewulf> file delete name123 given-name
wwsh node list
NAME GROUPS IPADDR HWADDR
================================================================================
c11 UNDEF 172.18.1.11 fa:16:3e:62:20:f6
c12 UNDEF 172.18.1.12 fa:16:3e:f7:d3:46
c13 UNDEF 172.18.1.13 fa:16:3e:81:1d:b7
运维
文件不完成为问题
su: warning: cannot change directory to /test: No such file or directory
-bash: .: /opt/ohpc/admin/lmod/lmod/init/bash: cannot execute binary file
-bash: module: command not found
采用 warewulf 启动的机器,默认挂载的是内存盘,若果内存小导致文件无法存放完。可以通过扩展内存规避该问题。
warewulf 更新镜像
$ export OHPC_INPUT_LOCAL="/root/conf/input.local"
$ export CHROOT=/opt/ohpc/admin/images/centos7.7
$ yum install --installroot=$CHROOT openssh-server vi vim -y
$ chroot $CHROOT systemctl enable sshd
$ chroot $CHROOT systemctl is-enabled sshd
$ wwsh pxe update -v
$ wwvnfs --chroot $CHROOT
Using 'centos7.7' as the VNFS name
Creating VNFS image from centos7.7
Compiling hybridization link tree : 0.24 s
Building file list : 0.81 s
Compiling and compressing VNFS : 76.23 s
Adding image to datastore : 20.05 s
Total elapsed time : 97.33 s
添加新的计算节点
略
FAQ
-bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
-bash: .: /opt/ohpc/admin/lmod/lmod/init/bash: cannot execute binary file
-bash: module: command not found
这两个错误分别由 lmod-ohpc-8.1.18-6.1.ohpc.1.3.9.x86_64、gnu8-compilers-ohpc 提供,发现 md5 值不一致,参考 warewulf 更新镜像 节重新安装:
$ export CHROOT="/opt/ohpc/admin/images/centos7.7"
$ yum reinstall -y --installroot=$CHROOT lmod-ohpc gnu8-compilers-ohpc
$ wwsh pxe update -v
$ wwvnfs --chroot $CHROOT
tmpfs 991M 991M 0 100% /
附录
部分文件
/etc/dhcp/dhcpd.conf
cat /etc/dhcp/dhcpd.conf
# DHCPD Configuration written by Warewulf. Do not edit this file, rather
# edit the template: /etc/warewulf/dhcpd-template.conf
allow booting;
allow bootp;
ddns-update-style interim;
authoritative;
option space ipxe;
# Tell iPXE to not wait for ProxyDHCP requests to speed up boot.
option ipxe.no-pxedhcp code 176 = unsigned integer 8;
option ipxe.no-pxedhcp 1;
option architecture-type code 93 = unsigned integer 16;
if exists user-class and option user-class = "iPXE" {
filename "http://172.20.0.80/WW/ipxe/cfg/${mac}";
} else {
if option architecture-type = 00:0B {
filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";
} elsif option architecture-type = 00:0A {
filename "/warewulf/ipxe/bin-arm32-efi/placeholder.efi";
} elsif option architecture-type = 00:09 {
filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
} elsif option architecture-type = 00:07 {
filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
} elsif option architecture-type = 00:06 {
filename "/warewulf/ipxe/bin-i386-efi/snp.efi";
} elsif option architecture-type = 00:00 {
filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
}
}
subnet 172.20.0.0 netmask 255.255.255.0 {
not authoritative;
# option interface-mtu 9000;
option subnet-mask 255.255.255.0;
}
# Node entries will follow below
group {
# Evaluating Warewulf node: c1 (DB ID:9)
# Adding host entry for c1-ens33
host c1-ens33 {
option host-name c1;
hardware ethernet 00:50:56:39:e9:82;
fixed-address 172.20.0.3;
next-server 172.20.0.80;
}
}