AWS EC2 Cluster setup for Parallel HDF5 as well as serial HDF5 on Ubuntu 16.04LTS systems

H5CPP for now has a strict C++17 requirements, which in time will be dropped to c++14. The easiest way to start is to obtain a generic ubuntu LTS image then go through the following steps:

GCC-8 from binary, courtesy of Jonathon F

1 sudo add-apt-repository ppa:jonathonf/gcc-8.1
2 sudo apt-get update
3 sudo apt-get upgrade
4 sudo apt-get install gcc-8 g++-8
5 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 100 --slave /usr/bin/g++ g++ /usr/bin/g++-8

CMAKE 3.11 the most recent version can be cloned from this github repository

1 git clone
2 cd CMake && ./bootstrap && make
3 sudo make install

LLVM 7.0.0 and Clang is required to compile and run the h5cpp source code transformation tool. Be certain there is 50GB free disk and 16GB memory space on the system you compiling to prevent spurious error messages in the compile or linking phase. AWS EC2 m3.2xlarge instance has suitable local disk and memory space.

1 git clone
2 cd llvm/tools
3 git clone
4 # begin-optional
5 cd ../projects
6 git clone
7 git clone
8 git clone
9 # end-optional
10 cd ../../ && mkdir build && cd build
12 nohup make -j8&
13 sudo make install

For parellel HDF5 you need a POSIX compliant parallel (duh) filesystem. OrangeFS is a good FSF alternative to commercial solutions. This step is optional, alternative solution is to use single write multiple read SWMR mode where on each computing node you have a dedicated MPI process for IO.

1 sudo apt-get install -y gcc flex bison libssl-dev libdb-dev linux-source perl make autoconf linux-headers-`uname -r` zip openssl automake autoconf patch g++ libattr1-dev
2 ./configure --with-kernel=/usr/src/linux-headers-$(uname -r) --prefix=/usr/local --enable-shared
3 make -j4 && sudo make install
4 make kmod # to build kernel module
5 sudo make kmod_install
6 # load module
7 sudo insmod /lib/modules/4.4.0-1062-aws/kernel/fs/pvfs2/pvfs2.ko

MPI is industry standard for supercomputing and is viable alternative to hadoop on clusters. Be sure to enable grid-engine, and set the C compiler to gcc-5, since it failed with gcc-8 on my install, don't forget to verify SGE grid engine execute ompi_info | grep gridengine

1 gunzip -c openmpi-3.1.1.tar.gz | tar xf -
2 cd openmpi-3.1.1
3 #gcc-8 fails for me July 2018
4 CC=gcc-5 ./configure --with-sge --with-pvfs2 --prefix=/usr/local
5 make -j4 && sudo make install

SGE can be downloaded from here: TODO add link


1 sudo apt install libboost-math-dev # ublas support

The best practice is to install all linear algebra systems from sources, starting with BLAS/LAPACK: INTEL MKL | AMD CML | ATLAS | openBLAS | NETLIB Then following with your C++ LinearAlgebra/scientific library. Be sure that the optimized BLAS/LAPACK is picked up during configuration. In addition to standard functionality you may be interested in SuperLU, Metis, Pardiso, SuiteSparse, UmfPack, Cholmod.

Here is the list of C++ supported Scientific/Linear Algebra libraries: armadillo eigen3 blitz blaze dlib itpp boost: ublas and ETL will be added soon. If I left your favourite out or lacking of functionality please shoot me an email.