The software systems responsible for making these clusters of computers work together can be called Distributed Research Management System. The most commonly used ones are SGE, PBS/TORQUE, and SLURM.
PBS command vs SGE commandshttp://www.softpanorama.org/HPC/PBS_and_derivatives/Reference/pbs_command_vs_sge_commands.shtml
SGE to SLURM conversionI. Sun Grid Engine installation on Centos Server
http://biowiki.org/wiki/index.php/Sun_Grid_EngineCreate a user account named sgeadmin on the head node and all the exec nodes, with the group name being sgeadmin also (although the group name is probably not important, just as long as it is the same on all the nodes). Make sure the user IDs and group IDs are the same for this account across all those nodes (this consistency actually is very important).
## add usersudo useradd -m sgeadmin -p 123456
## add group
sudo groupadd sgeadmin
## add user to a group
sudo usermod -a -G sgeadmin sgeadmin
STEP 1: PREPARE THE FILES
Download https://arc.liv.ac.uk/trac/SGE/
sge-8.0.0a-common.tar.gz and sge-8.0.0a-bin-lx-amd64.tar.gz
we will do a local installation on each node- that is, each node with SGE will have its own copy of the SGE binaries and its own local spool directory. This is to minimize NFS traffic, as the NFS will be probably used pretty intensively already for writing output of SGE jobs to the RAID node and for other things.
Use the same $SGE_ROOT=/opt/sge on each node
http://biowiki.org/wiki/index.php/Sun_Grid_Engine
read README.BUILD
# Install libhwloc-dev deb package:
change to root: sudo su -
sudo apt-get update && sudo apt-get upgrade -y
sudo apt-get install libhwloc-dev 2.1. Build the dependencies:
change to root:
untar 2 files into /opt/sge
sudo mkdir /opt/sgetar xvf sge-8.0.0a-common.tar.gz --directory /opt/sge
tar xvf sge-8.0.0a-bin-lx-amd64.tar.gz --directory /opt/sge
cd /home/canlab/wSourceCode/sge-8.1.9/source
sh scripts/bootstrap.sh -no-java -no-jni ./aimk -no-java -no-jni
2.2 The Configuration File: SGE provides automated installation scripts that will read options you set in your configuration file and perform an installation using them.. We are going to use a configuration file based on the template in wSourceCode/sge-8.1.9/source/dist/util/install_modules/inst_template.conf
Make a copy of the template and fill out the options tha_configuration.conf,
SGE_ROOT="/opt/sge"
SGE_JMX_SSL_CLIENT="false"
CELL_NAME="default"
ADMIN_USER=canlab
QMASTER_SPOOL_DIR=$SGE_ROOT/$CELL_NAME/spool/qmaster
EXECD_SPOOL_DIR=$SGE_ROOT/$CELL_NAME/spool
ADMIN_HOST_LIST="canHead"
SUBMIT_HOST_LIST="canHead"
EXEC_HOST_LIST=`canHead`
EXECD_SPOOL_DIR_LOCAL="$SGE_ROOT/$CELL_NAME/spool/execd"
ADMIN_MAIL="none"
Install:
log into the head node as root, add the SGE_QMASTER_PORT and SGE_EXECD_PORT (which should be two different ports set in conf. file) to your /etc/services file.
# SUN GRID ENGINE sge_qmaster 6444/tcp # for Sun Grid Engine (SGE) qmaster daemon sge_execd 6445/tcp # for Sun Grid Engine (SGE) exec daemon
execute the inst_sge script on the head node with the parameters -m (install Master Host, which is also the implied Submit and Administration Host), -x (install Execution Hosts), and -auto (read settings from the configuration file). In our case, this will be:
export SGE_ROOT=/opt/sge
cd /opt/sge
./inst_sge -m -x -auto /opt/sge/util/install_modules/tha_configuration.conf
II. Sun Grid Engine installation on Ubuntu Server
II.1. Try this for SGE: https://tkainrad.dev/posts/copy-paste-ready-instructions-to-set-up-1-node-clusters/
* gain root permissions On Ubuntu: sudo -i (or just put sudo before commands)
* Install Dependencies:
sudo apt-get update -y \
&& sudo apt-get install -y sudo bsd-mailx tcsh db5.3-util libhwloc5 libmunge2 libxm4 libjemalloc1 xterm openjdk-8-jre-headless \
&& sudo apt-get clean \
* Install Dependencies:
sudo apt-get update -y \
&& sudo apt-get install -y sudo bsd-mailx tcsh db5.3-util libhwloc5 libmunge2 libxm4 libjemalloc1 xterm openjdk-8-jre-headless \
&& sudo apt-get clean \
&& sudo rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
1. create a new folder & download source-code:
1. create a new folder & download source-code:
sudo mkdir -p /opt/sge/installfolder
export INSTALLFOLDER=/opt/sge/installfoldersudo wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-common_8.1.9_all.deb
sudo wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-doc_8.1.9_all.deb
sudo wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge_8.1.9_amd64.deb
#We also need to set a new profile.d config via
sudo ln -s $SGE_ROOT/$SGE_CELL/common/settings.sh /etc/profile.d/sge_settings.sh
&& sleep 10 \
&& /etc/init.d/sgemaster.docker-sge restart \
&& /etc/init.d/sgeexecd.docker-sge restart \
&& sed -i "s/HOSTNAME/`hostname`/" $INSTALLFOLDER/sge_exec_host.conf \
&& sed -i "s/HOSTNAME/`hostname`/" $INSTALLFOLDER/sge_hostgrp.conf \
&& /opt/sge/bin/lx-amd64/qconf -Me $INSTALLFOLDER/sge_exec_host.conf
export INSTALLFOLDER=/opt/sge/installfolder
##--
cd $INSTALLFOLDER sudo wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-doc_8.1.9_all.deb
sudo wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge_8.1.9_amd64.deb
sudo dpkg -i --force-all ./*.deb
2. Setup files:
download the following 4 files (sge_init.sh, sge_auto_install.conf, sge_hostgrp.conf, sge_exec_host.conf) and place them also into: /opt/sge/installfolder:sudo wget https://tkainrad.dev/other/sge_init.sh
# those scripts and configuration files automatically perform setting.
# Edit file sge_auto_instll.conf
SGE_ROOT="/opt/sge"
SGE_CLUSTER_NAME="docker-sge"
CELL_NAME="default"
# Edit file sge_auto_instll.conf
export SGE_HOST='cat /opt/sge/default/common/act_qmaster'
/etc/init.d/sgemaster.docker-sge restart
/etc/init.d/sgeexecd.docker-sge restart
#After the download, we need to set some environment variables in the current shell:
export SGE_ROOT=/opt/sge
export SGE_CELL=default
#We also need to set a new profile.d config via
sudo ln -s $SGE_ROOT/$SGE_CELL/common/settings.sh /etc/profile.d/sge_settings.sh
3. Install
# execute the following to install SGE and perform setup operations:
useradd -r -m -U -G sudo -d /home/sgeuser -s /bin/bash -c "Docker SGE user" sgeuser
cd $SGE_ROOT ##--
sudo ./inst_sge -m -x -s -auto $INSTALLFOLDER/sge_auto_install.conf \&& sleep 10 \
&& /etc/init.d/sgemaster.docker-sge restart \
&& /etc/init.d/sgeexecd.docker-sge restart \
&& sed -i "s/HOSTNAME/`hostname`/" $INSTALLFOLDER/sge_exec_host.conf \
&& sed -i "s/HOSTNAME/`hostname`/" $INSTALLFOLDER/sge_hostgrp.conf \
&& /opt/sge/bin/lx-amd64/qconf -Me $INSTALLFOLDER/sge_exec_host.conf
## Note: to reinstall, we need to delete these files in: /etc/init.d
sudo rm -r -f sgemaster.docker-sge
sudo rm -r -f sgeexecd.docker-sge
sudo rm -r -f /opt/sge/default
4. Add users
# we still need to add users to the
/opt/sge/bin/lx-amd64/qconf -au <USER> sgeuserssgeusers
group, which was defined in the sge_hostgrp.conf
file you just applied. Only users from this group are allowed to submit jobs. Therefore, we run the following:sudo /opt/sge/bin/lx-amd64/qconf -au canlab sgeusers
/opt/sge/bin/lx-amd64/qconf -au hung sgeusers
wget --quiet http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.0.tar.gz
./configure --prefix=/opt/torque --disable-werror
II.2 Another way:
https://www.socher.org/index.php/Main/HowToInstallSunGridEngineOnUbuntu
https://peteris.rocks/blog/sun-grid-engine-installation-on-ubuntu-server/
1. On Master Node
(install Master Host, which is also the implied Submit and Administration Host)
(install Execution Hosts)
https://gist.github.com/asadharis/9d14da97d9ad1f8eccc36dc14390e4e0
git clone https://gist.github.com/9d14da97d9ad1f8eccc36dc14390e4e0.git sgeSetup/
cd sgeSetup
sudo chmod +x install_sge.sh loop.sh sleep.sh
./install_sge.sh
./loop.sh
2. On woker Nodes
3. Unistall sge
https://howtoinstall.co/en/ubuntu/xenial/gridengine-master?action=removesudo apt-get autoremove --purge gridengine-master
II.3 Configure SGE
https://southgreenplatform.github.io/trainings/hpc/sgeinstallation/
B. PBS / Torque
# RHEL, CentOS, and Scientific Linux: yum install
# Ubuntu: sudo apt-get install
I. PBS on Ubuntu
http://docs.adaptivecomputing.com/torque/5-0-0/Content/topics/torque/1-installConfig/installing.htm
https://pmateusz.github.io/linux/torque/2017/03/25/torque-installation-on-ubuntu.html
http://docs.adaptivecomputing.com/torque/5-1-3/Content/topics/hpcSuiteInstall/manual/1-installing/installingTorque.htm
https://tkainrad.dev/posts/copy-paste-ready-instructions-to-set-up-1-node-clusters/#pbs--torque
1. installing the relevant packages required to run TORQUE 5.1.1:
sudo apt-get install libboost-all-dev libssl-dev libxml2-dev tcl8.6-dev tk8.6-dev libhwloc-dev cpuset
2. Download & ínstall TORQUE
https://ubuntuforums.org/showthread.php?t=289767
## from github (in use)
git clone https://github.com/adaptivecomputing/torque.git -b 6.1.1 torque-6.1.1 cd torque-6.1.1
./autogen.sh
## not use github
tar -xvzf torque-6.1.0.tar.gz
#############
#############
./configure --prefix=/opt/torque --disable-werror