使用 openHPC 安装 warewulf+slurm 环境

发布时间: 更新时间: 总字数:1746 阅读时间:4m 作者: IP上海 分享 网址
专栏文章
  1. 使用 openHPC 安装 warewulf+slurm 环境(当前)
  2. 手动安装 warewulf+slurm 环境

使用 openhpc 安装 warewulf+slurm 环境

warewulf 组成

  • VNFS 镜像制作
  • iPXE :计算节点通过 pxe 自动安装
  • 文件同步:添加用户自动将相关配置文件同步到计算节点
  • 节点维护:维护 HPC 集群的所有计算节点
    • /etc/hosts 维护

环境

  • 控制节点 CentOS 7.7
    • sms 172.20.0.80
  • 计算节点,没有系统
    • c1 172.20.0.3

安装步骤

安装 ohpc 源

yum install -y http://build.openhpc.community/OpenHPC:/1.3/CentOS_7/x86_64/ohpc-release-1.3-1.el7.x86_64.rpm

安装 docs-ohpc 包

[sms]# yum -y install docs-ohpc

复制模板文件 input.local

[sms]# cp /opt/ohpc/pub/doc/recipes/centos7/input.local input.local

编辑模板文件

+sms_ip="${sms_ip:-172.20.0.80}"
+sms_eth_internal="${sms_eth_internal:-ens33}"
+internal_netmask="${internal_netmask:-255.255.255.0}"
+eth_provision="${eth_provision:-ens33}"
+enable_ganglia="${enable_ganglia:-1}"
+enable_nagios="${enable_nagios:-1}"
+nagios_web_password="${nagios_web_password:-admin}"
-slurm_node_config="${slurm_node_config:-c[1-4] Sockets=2 CoresPerSocket=12 ThreadsPerCore=2}"
+slurm_node_config="${slurm_node_config:-c[1-1] Sockets=2 CoresPerSocket=1 ThreadsPerCore=1}"
+num_computes="${num_computes:-1}"
+c_ip[0]=172.20.0.3
+c_mac[0]=00:50:56:39:E9:82

slurm_node_config 配置说明:

  • Sockets=lscpu | grep "^Socket(s):" | awk '{print $2}'
  • CoresPerSocket=lscpu | grep "^Core(s) per socket:" | awk -F ":" '{print $2}'
  • ThreadsPerCore=lscpu | grep "^Thread(s) per core:" | awk -F ":" '{print $2}'

复制模板安装脚本

[sms]# cp -p /opt/ohpc/pub/doc/recipes/centos7/x86_64/warewulf/slurm/recipe.sh .
  • 定制修改 recipe.sh
  • 执行部署
[sms]# export OHPC_INPUT_LOCAL=./input.local
[sms]# ./recipe.sh

注意:

  • 如果网络好的话,大概需要 1h 后会安装完成
  • 因采用 VMware 虚拟化,可能会让输入 ipmi 密码,直接输入不用管了
  • PXE 安装 c1 系统,建议在安装完成后从网络启动 c1 既可以自动采用 PXE 安装
  • 安装完成后,可以通过 web 访问如下:
    • http://172.20.0.80/ganglia/
    • http://172.20.0.80/nagios/ 用户名/密码:nagiosadmin / admin
    • http://172.20.0.80/WW/ipxe/cfg/00:50:56:39:e9:82 (ww ipxe 地址,访问 /WW/ 等报 Forbidden 属正常现象)

测试

su - test

手动安装过程

# 创建BOS镜像
$ export CHROOT=/opt/ohpc/admin/images/centos7.7
$ wwmkchroot centos-7 $CHROOT

# 安装 openhpc 基础包
$ yum install -y --installroot=$CHROOT ohpc-base-compute

# 更新 resolv.conf
$ cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf

# Initialize warewulf database and ssh_keys
[sms]# wwinit database
[sms]# wwinit ssh_keys

# Warewulf 支持从管理节点导入任意文件来分发到计算节点的功能
[sms]# wwsh file import /etc/passwd
[sms]# wwsh file import /etc/group
[sms]# wwsh file import /etc/shadow

# 组装引导镜像
# (Optional) Include drivers from kernel updates; needed if enabling additional kernel modules on computes
[sms]# export WW_CONF=/etc/warewulf/bootstrap.conf
[sms]# echo "drivers += updates/kernel/" >> $WW_CONF
​
# (Optional) Include overlayfs drivers; needed by Singularity
[sms]# echo "drivers += overlay" >> $WW_CONF
​
# Build bootstrap image
[sms]# wwbootstrap `uname -r`

# 组装虚拟节点文件系统(VNFS)映像
wwvnfs --chroot $CHROOT

# 注册计算节点
# Set provisioning interface as the default networking device
[sms]# echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$
[sms]# wwsh -y file import /tmp/network.$$ --name network
[sms]# wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0
​
# Add nodes to Warewulf data store
[sms]# for ((i=0; i<$num_computes; i++)) ; do
wwsh -y node new ${c_name[i]} --ipaddr=${c_ip[i]} --hwaddr=${c_mac[i]} -D ${eth_provision}
done
​
# Additional step required if desiring to use predictable network interface
# naming schemes (e.g. en4s0f0). Skip if using eth# style names.
[sms]# export kargs="${kargs} net.ifnames=1,biosdevname=1"
[sms]# wwsh provision set --postnetdown=1 "${compute_regex}"
​
# Define provisioning image for hosts
[sms]# wwsh -y provision set "${compute_regex}" --vnfs=centos7.7 --bootstrap=`uname -r` \
--files=dynamic_hosts,passwd,group,shadow,network
​
# Restart dhcp / update PXE
[sms]# systemctl restart dhcpd
[sms]# wwsh pxe update

# 计算节点磁盘配置 Grub2
# Add GRUB2 bootloader and re-assemble VNFS image
[sms]# yum -y --installroot=$CHROOT install grub2
[sms]# wwvnfs --chroot $CHROOT

# Select (and customize) appropriate parted layout example
[sms]# cp /etc/warewulf/filesystem/examples/gpt_example.cmds /etc/warewulf/filesystem/gpt.cmds
[sms]# wwsh provision set --filesystem=gpt "${compute_regex}"
[sms]# wwsh provision set --bootloader=sda "${compute_regex}"

# 计算节点磁盘配置 uefi
# Add GRUB2 bootloader and re-assemble VNFS image
[sms]# yum -y --installroot=$CHROOT install grub2-efi grub2-efi-modules
[sms]# wwvnfs --chroot $CHROOT
[sms]# cp /etc/warewulf/filesystem/examples/efi_example.cmds /etc/warewulf/filesystem/efi.cmds
[sms]# wwsh provision set --filesystem=efi "${compute_regex}"
[sms]# wwsh provision set --bootloader=sda "${compute_regex}"

# 配置warewulf使用节点的本地存储作为引导设备
# Configure local boot (after successful provisioning)
[sms]# wwsh provision set --bootlocal=normal "${compute_regex}"
# 开发工具
# Install autotools meta-package
[sms]# yum -y install ohpc-autotools
[sms]# yum -y install EasyBuild-ohpc
[sms]# yum -y install hwloc-ohpc
[sms]# yum -y install spack-ohpc
[sms]# yum -y install valgrind-ohpc

# 编译器
yum -y install gnu8-compilers-ohpc

# MPI
yum -y install openmpi3-gnu8-ohpc mpich-gnu8-ohpc

性能工具
# Install perf-tools meta-package
[sms]# yum -y install ohpc-gnu8-perf-tools
  • 添加用户
[sms]# useradd -m test
[sms]# wwsh file resync passwd shadow group
[sms]# pdsh -w $compute_prefix[1-4] /warewulf/bin/wwgetfiles

provisioning services rely on

  • DHCP

  • TFTP

  • HTTP

  • 使用

$ wwsh node list
NAME                GROUPS              IPADDR              HWADDR
================================================================================
c1                  UNDEF               172.20.0.3          00:50:56:39:e9:82
$ wwsh file help
USAGE:
     file <command> [options] [targets]

SUMMARY:
     The file command is used for manipulating file objects.  It allows you to
     import, export, create, and modify files within the Warewulf data store.
     File objects may be used to supply files to nodes at provision time,
     dynamically create files or scripts based on Warewulf data and more.

COMMANDS:
     import             Import a file into a file object
     export             Export file object(s)
     edit               Edit the file in the data store directly
     new                Create a new file in the data store
     set                Set file attributes/metadata
     show               Show the contents of a file
     list               List a summary of imported file(s)
     print              Print all file attributes
     (re)sync           Sync the data of a file object with its source(s)
     delete             Remove a node configuration from the data store
     help               Show usage information

OPTIONS:
     -l, --lookup       Identify files by specified property (default: "name")
     -p, --program      What external program should be used (edit/show)
     -D, --path         Set destination (i.e., output) path for this file
     -o, --origin       Set origin (i.e., input) path for this file
     -m, --mode         Set permission attribute for this file
     -u, --uid          Set the UID for this file
     -g, --gid          Set the GID for this file
     -n, --name         Set the reference name for this file (not path!)
         --interpreter  Set the interpreter name to parse this file

NOTE:  Use "UNDEF" to erase the current contents of a given field.

EXAMPLES:
     Warewulf> file import /path/to/file/to/import --name=hosts-file
     Warewulf> file import /path/to/file/to/import/with/given-name
     Warewulf> file edit given-name
     Warewulf> file set given-name --origin=UNDEF --mode=0700
     Warewulf> file set hosts-file --path=/etc/hosts --mode=0644 --uid=0
     Warewulf> file list
     Warewulf> file delete name123 given-name


wwsh node list
NAME                GROUPS              IPADDR              HWADDR
================================================================================
c11                 UNDEF               172.18.1.11         fa:16:3e:62:20:f6
c12                 UNDEF               172.18.1.12         fa:16:3e:f7:d3:46
c13                 UNDEF               172.18.1.13         fa:16:3e:81:1d:b7

运维

文件不完成为问题

su: warning: cannot change directory to /test: No such file or directory
-bash: .: /opt/ohpc/admin/lmod/lmod/init/bash: cannot execute binary file
-bash: module: command not found

采用 warewulf 启动的机器,默认挂载的是内存盘,若果内存小导致文件无法存放完。可以通过扩展内存规避该问题。

warewulf 更新镜像

$ export OHPC_INPUT_LOCAL="/root/conf/input.local"
$ export CHROOT=/opt/ohpc/admin/images/centos7.7
$ yum install --installroot=$CHROOT openssh-server vi vim -y
$ chroot $CHROOT systemctl enable sshd
$ chroot $CHROOT systemctl is-enabled sshd
$ wwsh pxe update -v
$ wwvnfs --chroot $CHROOT
Using 'centos7.7' as the VNFS name
Creating VNFS image from centos7.7
Compiling hybridization link tree                           : 0.24 s
Building file list                                          : 0.81 s
Compiling and compressing VNFS                              : 76.23 s
Adding image to datastore                                   : 20.05 s
Total elapsed time                                          : 97.33 s

添加新的计算节点

FAQ

  • 通过 PXE 安装 c1 的时候,可能会出现 getvnfs extracting ERROR 的现象,可能是由于根分区 / 空间不足导致的

  • 在计算节点执行命令报错:

-bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)

-bash: .: /opt/ohpc/admin/lmod/lmod/init/bash: cannot execute binary file
-bash: module: command not found

这两个错误分别由 lmod-ohpc-8.1.18-6.1.ohpc.1.3.9.x86_64gnu8-compilers-ohpc 提供,发现 md5 值不一致,参考 warewulf 更新镜像 节重新安装:

$ export CHROOT="/opt/ohpc/admin/images/centos7.7"
$ yum reinstall -y --installroot=$CHROOT lmod-ohpc gnu8-compilers-ohpc
$ wwsh pxe update -v
$ wwvnfs --chroot $CHROOT
tmpfs           991M  991M     0 100% /

附录

部分文件

/etc/dhcp/dhcpd.conf

cat /etc/dhcp/dhcpd.conf
# DHCPD Configuration written by Warewulf. Do not edit this file, rather
# edit the template: /etc/warewulf/dhcpd-template.conf

allow booting;
allow bootp;
ddns-update-style interim;
authoritative;

option space ipxe;

# Tell iPXE to not wait for ProxyDHCP requests to speed up boot.
option ipxe.no-pxedhcp code 176 = unsigned integer 8;
option ipxe.no-pxedhcp 1;

option architecture-type   code 93  = unsigned integer 16;

if exists user-class and option user-class = "iPXE" {
    filename "http://172.20.0.80/WW/ipxe/cfg/${mac}";
} else {
    if option architecture-type = 00:0B {
        filename "/warewulf/ipxe/bin-arm64-efi/snp.efi";
    } elsif option architecture-type = 00:0A {
        filename "/warewulf/ipxe/bin-arm32-efi/placeholder.efi";
    } elsif option architecture-type = 00:09 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:07 {
        filename "/warewulf/ipxe/bin-x86_64-efi/snp.efi";
    } elsif option architecture-type = 00:06 {
        filename "/warewulf/ipxe/bin-i386-efi/snp.efi";
    } elsif option architecture-type = 00:00 {
        filename "/warewulf/ipxe/bin-i386-pcbios/undionly.kpxe";
    }
}

subnet 172.20.0.0 netmask 255.255.255.0 {
   not authoritative;
   # option interface-mtu 9000;
   option subnet-mask 255.255.255.0;
}

# Node entries will follow below


group {
   # Evaluating Warewulf node: c1 (DB ID:9)
   # Adding host entry for c1-ens33
   host c1-ens33 {
      option host-name c1;
      hardware ethernet 00:50:56:39:e9:82;
      fixed-address 172.20.0.3;
      next-server 172.20.0.80;
   }
}

参考

  1. https://github.com/openhpc/ohpc/wiki/1.3.X
本文总阅读量 次 本站总访问量 次 本站总访客数
Home Archives Categories Tags Statistics