Monday, July 29, 2019

install UCX

I. Install UCX

UCX need to compile OpenMPI to use InfiniBand

To use UCX you need to
  1. Get the recent release from
  2. Build and make it avaliable to your machines
  3. Configure OMPI --with-ucx="path-to-ucx" and rebuild/reinstall it
Afterwards when you launch you set UCX pml:
$ mpirun -mca btl self -mca pml ucx ....
To control which device and what transport are being used you can add following env variables:
$ mpirun -mca btl self -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1 -x UCX_TLS=rc,shm ....
Try to experiment with different TLS's see here for more info.

* OpenMPI 4.0,3 support ucx 1.7 or older
OpenMPI 4.0,4 support newer ucx

#### install from Source (work now, but should not be use to avoid runtime errors)
## Requirements: autoconf-2.69b, libtool-2.4.6, automake-1.14
# git clone --branch master  ucx-master
cd ucx-master
module load tool_dev/autoconf-2.69b
module load tool_dev/automake-1.14
module load tool_dev/libtool-2.4.6 
export ACLOCAL_PATH=/home1/p001cao/local/app/tool_dev/libtool-2.4.6/share/aclocal
mkdir build   &&  cd build

### (install from Release --> no need ./autogen.h)
tar xvf ucx-1.10.1.tar.gz
cd ucx-1.10.1
mkdir build   &&  cd build
## USC2
module load tool_dev/binutils-2.35              # gold
module load compiler/gcc-10.3
export PATH=$PATH:/home1/p001cao/local/app/compiler/gcc-10.3/bin
export CC=gcc export CXX=g++ export FORTRAN=gfortran
../contrib/configure-release  --enable-mt --with-knem=$myKNEM \
LDFLAGS="-fuse-ld=gold -lrt  -L$myNUMA/lib -Wl,-rpath,$myNUMA/lib" \
CFLAGS="-I$myNUMA/include" \

## USC1 (eagle)
module load tool_dev/binutils-2.35              # gold 
module load compiler/gcc-10.3
export PATH=$PATH:/uhome/p001cao/local/app/compiler/gcc-10.3/bin
export CC=gcc export CXX=g++ export FORTRAN=gfortran
../contrib/configure-release --enable-mt --with-knem=$myKNEM \
LDFLAGS="-fuse-ld=gold -lrt  -L$myNUMA/lib -Wl,-rpath,$myNUMA/lib" \
CFLAGS="-I$myNUMA/include" \

--with-rc --with-ud --with-dc --with-ib-hw-tm --with-dm --with-cm \
## consider options
--with-verbs(=DIR)      Build OpenFabrics support, adding DIR/include,
                          DIR/lib, and DIR/lib64 to the search path for
                          headers and libraries
  --with-rc               Compile with IB Reliable Connection support
  --with-ud               Compile with IB Unreliable Datagram support
  --with-dc               Compile with IB Dynamic Connection support
  --with-mlx5-dv          Compile with mlx5 Direct Verbs support. Direct Verbs
                          (DV) support provides additional acceleration
                          capabilities that are not available in a regular
  --with-ib-hw-tm         Compile with IB Tag Matching support
  --with-dm               Compile with Device Memory support

--with-cm               Compile with IB Connection Manager support

##-- Consider
LDFLAGS="-fuse-ld=gold -lrt  -L$myNUMA/lib -Wl,-rpath,$myNUMA/lib" \
CFLAGS="-I$myNUMA/include" \
# export myKNEM=/home1/p001cao/local/app/tool_dev/knem1.1.3    
# export myOFI=/home1/p001cao/local/app/tool_dev/libfabric-1.10.1 
--with-verbs=${myOFI} --with-knem=${myKNEM} \

####2. Intel
module load intel/compiler-xe19u5
export PATH=/home1/p001cao/local/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/bin/intel64:$PATH
export CC=icc  export CXX=icpc  export FORTRAN=ifort
export LD_LIBRARY_PATH=/home1/p001cao/local/app/intel/xe19u5/compilers_and_libraries_2019.5.281/linux/compiler/lib/intel64_lin:$LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/home1/p001cao/local/app/tool_dev/glibc-2.18/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib

export myKNEM=/home1/p001cao/local/app/tool_dev/knem1.1.3    
export myOFI=/home1/p001cao/local/app/tool_dev/libfabric-1.10.1
../contrib/configure-release --disable-numa --enable-mt LDFLAGS="-fuse-ld=lld -lrt" \
--with-verbs=${myOFI} --with-knem=${myKNEM} \

List of main transports and aliases

alluse all the available transports.
small shared memory transports.
shmsame as "sm".
ugniugni_rdma and ugni_udt.
rcRC (=reliable connection), and UD (=unreliable datagram) for connection bootstrap.
"accelerated" transports are used if possible.
udUD transport, "accelerated" is used if possible.
dcDC - Mellanox scalable offloaded dynamic connection transport
rc_xSame as "rc", but using accelerated transports only
rc_vSame as "rc", but using Verbs-based transports only
ud_xSame as "ud", but using accelerated transports only
ud_vSame as "ud", but using Verbs-based transports only
tcpTCP over SOCK_STREAM sockets
rdmacmUse RDMACM connection management for client-server API
sockcmUse sockets-based connection management for client-server API
cuda_copyUse cu\*Memcpy for hostcuda device self transfers but also to detect cuda memory
gdr_copyUse GDRcopy library for hostcuda device self transfers
cuda_ipcUse CUDA-IPC for cuda devicedevice transfers over PCIe/NVLINK
rocm_copyUse for host-rocm device transfers
rocm_ipcUse IPC for rocm device-device transfers
selfLoopback transport to communicate within the same process

II. UCX optional Libs

1. rdma-core (fail)
UCX detects the exiting libraries on the build machine and enables/disables support for various features accordingly. If some of the modules UCX was built with are not found during runtime, they will be silently disabled.
  • Basic shared memory and TCP support - always enabled
  • Optimized shared memory - requires knem or xpmem drivers. On modern kernels also CMA (cross-memory-attach) mechanism will be used.
  • RDMA support - requires rdma-core or libibverbs library.
  • NVIDIA GPU support - requires Cuda drives
  • AMD GPU support - requires ROCm drivers
git clone  rdma-core
cd rdma-core
tar xvf rdma-core-30.0.tar.gz
cd rdma-core-30.0
module load compiler/gcc-10.1.0
module load tool_dev/cmake-3.17.2 
module load tool_dev/libnl-3.0
module load tool_dev/libtool-2.4.6

export LD_LIBRARY_PATH=/home1/p001cao/local/app/tool_dev/libnl-3.0/lib:$LD_LIBRARY_PATH ./

2. libnuma-devel
tar xzf numactl-2.0.13.tar.gz
cd numactl-2.0.13
module load tool_dev/autoconf-2.69b
mkdir build && cd build 
../configure --prefix=/home1/p001cao/local/app/tool_dev/numactl-2.0.13

3. openMPI/UCX: libfabric ()
If building directly from the libfabric git tree, run './' before the configure step.
module load tool_dev/autoconf-2.69b
tar -xvf libfabric-1.11.1.tar.bz2
cd libfabric-1.11.1
module load compiler/gcc-10.2  

## IB cluster
./configure --prefix=/uhome/p001cao/local/app/tool_dev/libfabric-1.11.1-IB

## noIB cluster
./configure --prefix=/uhome/p001cao/local/app/tool_dev/libfabric-1.11.1-noIB

## module 
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig

4. openMPI/UCX: KNEM
tar zxvf knem-1.1.4.tar.gz 
cd knem-1.1.4
./configure --prefix=/uhome/p001cao/local/app/tool_dev/knem-1.1.4

--> cannot install: require linux kernel 4.x
check: uname -a
tar zxvf xpmem-2.6.3.tar.gz
cd xpmem-2.6.3

./configure --prefix=/home1/p001cao/local/app/tool_dev/xpmem-2.6.2

No comments:

Post a Comment