Wednesday, May 13, 2020

Threading Building Blocks (TBB)

Why: need for Kokkos
Intel® Threading Building Blocks (Intel® TBB) is a library that supports scalable parallel programming using standard ISO C++ code. It does not require special languages or compilers. It is designed to promote scalable data parallel programming. Additionally, it fully supports nested parallelism, so you can build larger parallel components from smaller parallel components. To use the library, you specify tasks, not threads, and let the library map tasks onto threads in an efficient manner.
Many of the library interfaces employ generic programming, in which interfaces are defined by requirements on types and not specific types. The C++ Standard Template Library (STL) is an example of generic programming. Generic programming enables Intel TBB to be flexible yet efficient. The generic interfaces enable you to customize components to your specific needs.
The net result is that Intel TBB enables you to specify parallelism far more conveniently than using raw threads, and at the same time can improve performance.
Intel(R) Threading Building Blocks is available commercially (see as a binary distribution, and in open source, in both source and binary forms (see

1. Require: glibc 2.17
check: ldd --version
The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. 

2. Compile TBB
configure manuall in: oneTBB-2020.2/cmake/README.rst
tar zxvf oneTBB-2020.2.tar.gz
cd oneTBB-2020.2/build
#-- comment out these lines from the "build/" file:
# gcc 4.8 and later support RTM intrinsics, but require command line switch to enable them
ifneq (,$(shell gcc -dumpversion | egrep  "^4\.[8-9]"))
    RTM_KEY = -mrtm
module load compiler/gcc-9.2.0 
module load tool_dev/cmake-3.17.2
module load tool_dev/glibc-2.19
export LD_LIBRARY_PATH=/home1/p001cao/local/app/tool_dev/glibc-2.19/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib

cd oneTBB-2020.2/src
make -j 8

it will produce new folder "build/linux_intel64_gcc_cc9.2.0_libc2.12_kernel2.6.32_release" contain library of TBB.

Wednesday, May 6, 2020


1. Cmake

need C++ compiler to install CMAKE, so need to load C++ compiler before install

Compile from source and install
Download CMake from:
tar zxvf cmake-3.21.2.tar.gz
cd cmake-3.21.2
module load compiler/gcc-10.3
# USC 2:
./configure --prefix=/home1/p001cao/local/app/tool_dev/cmake-3.20.3

# USC 1:
./configure --prefix=/uhome/p001cao/local/app/tool_dev/cmake-3.21

 make -j8 
 make install

create module file
cd /uhome/p001cao/local/Imodfiles 
 create file "cmake-3.17.2"

# for Tcl script use only 
 set         topdir                 /uhome/p001cao/local/app/tool_dev/cmake-3.18.0
 set        version                cmake-3.18.0
 setenv  cmake                 $topdir

prepend-path      PATH            $topdir/bin

Validate installation
module load cmake-3.17.2
cmake --version
 cd autoconf-2.69b
./configure --prefix=/home1/p001cao/local/app/tool_dev/autoconf-2.69b

2a. automake:

module load tool_dev/autoconf-2.69b

tar xvzf automake-1.14.tar.gz
cd automake-1.14
./configure --prefix=/home1/p001cao/local/app/tool_dev/automake-1.14

2b. update pkg-config (optional);O=D
tar xvzf pkg-config-0.29.2.tar.gz
cd pkg-config-0.29.2
./configure --prefix=/home1/p001cao/local/app/tool_dev/pkg-config-0.29

2c. libtool:
git clone git://
tar xvfz libtool-2.4.6.tar.gz
cd libtool-2.4.6
./configure --prefix=/home1/p001cao/local/app/tool_dev/libtool-2.4.6

### use 
export ACLOCAL_PATH=/home1/p001cao/local/app/tool_dev/libtool-2.4.6/share/aclocal

2d. autogen: (may not need)
2d1. Guile:
tar xvf guile-3.0.4.tar.gz
cd guile-3.0.4
module load compiler/gcc-10.2
module load tool_dev/libtool-2.4.6

./configure --with-libltdl-prefix=/home1/p001cao/local/app/tool_dev/libtool-2.4.6 \

2d2. autogen:
tar xvfz autogen-5.18.16.tar.gz
cd autogen-5.18.16

./configure --prefix=/home1/p001cao/local/app/tool_dev/autogen-5.18.16

II. lld

cd Tooldev
git clone llvm-master

cd llvm-master
mkdir build && cd build
module load tool_dev/cmake-3.17.2
module load conda/py37clangSupp
module load compiler/gcc-10.2
export myGCC=/home1/p001cao/local/app/compiler/gcc-10.2/bin
cmake ../llvm -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release \

2. lld may available with intel compiler

III. libxml2: (can use conda instead)

wget tar zxf libxml2-2.9.9.tar.gz cd libxml2-2.9.9 ./configure --prefix=/home1/p001cao/local/app/tool_dev/libxml2-2.9.9 --without-python
make make install
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig

4. Update glibc

strings /lib64/|grep GLIBC     # for all versions. 
wget         # may error with other version, need root
tar xvf glibc-2.18.tar.gz
cd glibc-2.18
mkdir build && cd build

../configure --disable-profile --enable-add-ons \
--with-binutils=/home1/p001cao/local/app/tool_dev/binutils-2.32 \
ln -s /home1/p001cao/local/app/tool_dev/glibc-2.17/   /lib64/  
check:  ldd --version
export LD_LIBRARY_PATH=/home1/p001cao/local/app/tool_dev/glibc-2.15/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib

## Module file (not work)
if { [module-info mode load] || [module-info mode switch2] } {
    puts stdout "export LD_LIBRARY_PATH=/home1/p001cao/local/app/tool_dev/glibc-2.19/lib:$LD_LIBRARY_PATH;"
    puts stdout "export LD_LIBRARY_PATH=/usr/local/lib;"
} elseif { [module-info mode remove] && ![module-info mode switch3] } {
    puts stdout "export LD_LIBRARY_PATH=/usr/local/lib;"
export myBinutils=/home1/p001cao/local/app/tool_dev/binutils-2.32

## use
export myGLIBC=/home1/p001cao/local/app/tool_dev/glibc-2.15/lib
LDFLAGS="-L${myGLIBC} -Wl,-rpath,${myGLIBC} -Wl,--dynamic-linker,${myGLIBC}/" \

6. GSL
* need to link LAPACK, BLAS library when install Plumed in Lammps (but no need now)

./configure --prefix=/home1/p001cao/local/app/tool_dev/gsl-2.6

B. linker

Use conda to manager linker
module load conda/conda3
conda create -n py37linker python=3.7
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig
source activate py37linker 
conda install autoconf automake cmake ninja libtool             # make tool
conda install -c conda-forge lld=9.0.1 binutils                        # llmv & gold linker
conda install -c conda-forge numactl-devel-cos6-x86_64                      # numa openmp
conda install -c asmeurer glibc
LDFLAGS="-fuse-ld=lld -lrt"

Tuesday, May 5, 2020


Which MPI implementation?
MVAPICH2 (MPI-3 over InfiniBand) is an MPI-3 implementation based on MPICH ADI3 layer. 
tar -xzf mvapich2-2.3.2.tar.gz
module load conda/py37mvapichSupp 
cd mvapich2-2.3.2

I. install MVAPICH2 + GCC (USC)

1. Supporting: 
module load conda/conda3
conda create -n py37mvapichSupp python=3.7
source activate py37mvapichSupp 
conda install autoconf automake 
conda install -c sas-institute libnuma
conda install -c conda-forge lld=9.0.1 binutils    # llmv & gold linker
prepend-path PKG_CONFIG_PATH $topdir/lib/pkgconfig

2. Configuration

#2.2. USC 2:
module load compiler/gcc-9.2.0   
module load conda/py37mvapichSupp         # to use gold linker or lld linker
./configure CC=gcc CXX=g++ FC=gfortran F77=gfortran LDFLAGS="-fuse-ld=gold" \
--with-device=ch3:mrail --with-rdma=gen2 --enable-hybrid \


The job example '' starts the xhpl' program. Please note that a MPI job that has to start 'mpirun_rsh' with the options  "-np $NSLOTS" to start the job with the correct number of slots ($NSLOTS is set by Grid Engine).
To pass information where to start the MPI tasks one has to pass  "-hostfile $TMPDIR/machines" as the second argument.

Additionally, for tight integration remember to use "-rsh " and optionally, you can use "-nowd" to prevent mvapich to 'cd $wd' in the remote hots.
This leaves SGE in charge of the working directory.