H5CPP Documentation

Easy to use HDF5 C++11 compiler assisted templates for HDF5

Hierarchical Data Format prevalent in high performance scientific computing, sits directly on top of sequential or parallel file systems, providing block and stream operations on standardized or custom binary/text objects. Scientific computing platforms such as Python, R, Matlab, Fortran, Julia [and many more...] come with the necessary libraries to read write HDF5 dataset. This edition simplifies interactions with popular linear algebra libraries, provides compiler assisted seamless object persistence, Standard Template Library support and equipped with novel error handling architecture.

How simple can it get?

h5::write( "arma.h5", "arma vec inside matrix", V // object contains 'count' and rank being written
,h5::current_dims{40,50} // control file_space directly where you want to place vector
,h5::offset{5,0} // when no explicit current dimension given current dimension := offset .+ object_dim .* stride (hadamard product)
,h5::stride{4,4}, h5::block{3,3}
,h5::max_dims{40,H5S_UNLIMITED} )
// wouldn't it be nice to have unlimited dimension?
// if no explicit chunk is set, then the object dimension is
// used as unit chunk

The above example is to demonstrate partial create + write with extendable datasets which can be turned into high performance packet table: h5::pt_t by a simple conversion h5::pt_t pt = h5::open( ... ) , h5::pt_t pt = h5::create(...) or h5::pt_t pt = ds where h5::ds_t ds = ....

h5::create | h5::read | h5::write | h5::append take RAII enabled descriptors or CAPI hid_t ones – depending on conversion policy: seamless or controlled fashion. The optional arguments may be placed in any order, compile time computed and come with intelligent compile time error messages.

h5::write( "arma.h5", "arma vec inside matrix", V
,h5::stride{4,4}, h5::block{3,3}, h5::current_dims{40,50}
,h5::offset{5,0}, h5::max_dims{40,H5S_UNLIMITED} );

Wouldn't it be nice not to have to remember everything? Focus on the idea, write it out intuitively and refine it later. The function construct below compiles into the same instructions as above.

h5::write( "arma.h5", "arma vec inside matrix", V
,h5::max_dims{40,H5S_UNLIMITED}, h5::stride{4,4}, h5::current_dims{40,50}
,h5::block{3,3}, h5::offset{5,0}, );

Why should you know about HDF5 at all, isn't it work about ideas? the details can be filled in later when needed:

// supported objects: raw pointers | armadillo | eigen3 | blaze | blitz++ | it++ | dlib | uBlas | std::vector
arma::vec V(4); // some vector: armadillo, eigen3,
h5::write( "arma.h5", "path/to/dataset name", V);

How about some really complicated POD struct type that your client or colleague wants to see in action right now? Invoke clang++ based h5cpp compiler on the translation unit – group of files which are turned into a single object file – and call it done!

namespace sn {
namespace typecheck {
struct Record { /*the types with direct mapping to HDF5*/
char _char; unsigned char _uchar; short _short; unsigned short _ushort; int _int; unsigned int _uint;
long _long; unsigned long _ulong; long long int _llong; unsigned long long _ullong;
float _float; double _double; long double _ldouble;
bool _bool;
// wide characters are not supported in HDF5
// wchar_t _wchar; char16_t _wchar16; char32_t _wchar32;
};
}
namespace other {
struct Record { // POD struct with nested namespace
MyUInt idx; // typedef type
MyUInt aa; // typedef type
double field_02[3]; // const array mapped
typecheck::Record field_03[4]; //
};
}
namespace example {
struct Record { // POD struct with nested namespace
MyUInt idx; // typedef type
float field_02[7]; // const array mapped
sn::other::Record field_03[5]; // embedded Record
sn::other::Record field_04[5]; // must be optimized out, same as previous
other::Record field_05[3][8]; // array of arrays
};
}
namespace not_supported_yet {
// NON POD: not supported in phase 1
// C++ Class -> PODstruct -> persistence[ HDF5 | ??? ] -> PODstruct -> C++ Class
struct Container {
double idx; //
std::string field_05; // c++ object makes it non-POD
std::vector<example::Record> field_02; // ditto
};
}
/* BEGIN IGNORED STRUCT */
// these structs are not referenced with h5::read|h5::write|h5::create operators
// hence compiler should ignore them.
struct IgnoredRecord {
signed long int idx;
float field_0n;
};
/* END IGNORED STRUCTS */

I did try to make the above include file as ugly and complicated as I could. But do you really need to read it? What if you had a technology at your disposal that can do it for you? In your code all you have to do is to trigger the compiler by making any h5:: operations on the desired data structure. It works without the hmmm boring details?

...
std::vector<sn::example::Record> vec
= h5::utils::get_test_data<sn::example::Record>(20);
// mark vec with an h5:: operator and delegate
// the details to h5cpp compiler
h5::write(fd, "orm/partial/vector one_shot", vec );
...

H5CPP is a novel approach to persistence in the field of machine learning, it provides high performance sequential and block access to HDF5 containers through modern C++.

Templates:

create dataset within an opened hdf5 file

file ::= const h5::fd_t& fd | const std::string& file_path;
dataspace ::= const h5::sp_t& dataspace | const h5::current_dims& current_dim [, const h5::max_dims& max_dims ] |
[,const h5::current_dims& current_dim] , const h5::max_dims& max_dims;
template <typename T> h5::ds_t create( file, const std::string& dataset_path, dataspace,
[, const h5::lcpl_t& lcpl] [, const h5::dcpl_t& dcpl] [, const h5::dapl_t& dapl] );

read a dataset and return a reference of the created object

dataset ::= (const h5::fd_t& fd | const std::string& file_path, const std::string& dataset_path ) | const h5::ds_t& ds;
template <typename T> T read( dataset
[, const h5::offset_t& offset] [, const h5::stride_t& stride] [, const h5::count_t& count]
[, const h5::dxpl_t& dxpl ] ) const;
template <typename T> h5::err_t read( dataset, T& ref
[, const h5::offset_t& offset] [, const h5::stride_t& stride] [, const h5::count_t& count]
[, const h5::dxpl_t& dxpl ] ) [noexcept] const;

write dataset into a specified location

dataset ::= (const h5::fd_t& fd | const std::string& file_path, const std::string& dataset_path ) | const h5::ds_t& ds;
template <typename T> h5::err_t write( dataset, const T* ptr
[,const hsize_t* offset] [,const hsize_t* stride] ,const hsize_t* count [, const h5::dxpl_t dxpl ] ) noexcept;
template <typename T> h5::err_t write( dataset, const T& ref
[,const h5::offset_t& offset] [,const h5::stride_t& stride] [,const& h5::dxcpl_t& dxpl] ) [noexept];

append to extendable C++/C struct dataset

#include <h5cpp/core>
#include "your_data_definition.h"
#include <h5cpp/io>
template <typename T> void h5::append(h5::pt_t& ds, const T& ref) [noexcept];

All file and dataset io descriptors implement raii idiom and close underlying resource when going out of scope, and may be seamlessly passed to HDF5 CAPI calls when implicit conversion enabled. Similarly templates can take CAPI hid_t identifiers as arguments where applicable provided conversion policy allows. See conversion policy for details.

how to use:

sudo make install will copy the header files and h5cpp.pc package config file into /usr/local/ or copy them and ship it with your project. There is no other dependency than hdf5 libraries and include files. However to activate the template specialization for any given library you must include that library first then h5cpp. In case the auto detection fails turn template specialization on by defining:

#define [ H5CPP_USE_BLAZE | H5CPP_USE_ARMADILLO | H5CPP_USE_EIGEN3 | H5CPP_USE_UBLAS_MATRIX
| H5CPP_USE_UBLAS_VECTOR | H5CPP_USE_ITPP_MATRIX | H5CPP_USE_ITPP_VECTOR | H5CPP_USE_BLITZ | H5CPP_USE_DLIB | H5CPP_USE_ETL ]

if you're interested in h5cpp compiler please visit this page to recreate the required clang7.0 toolchain.

supported classes:

1 T := ([unsigned] ( int8_t | int16_t | int32_t | int64_t )) | ( float | double )
2 S := T | c/c++ struct | std::string
3 ref := std::vector<S>
4  | arma::Row<T> | arma::Col<T> | arma::Mat<T> | arma::Cube<T>
5  | Eigen::Matrix<T,Dynamic,Dynamic> | Eigen::Matrix<T,Dynamic,1> | Eigen::Matrix<T,1,Dynamic>
6  | Eigen::Array<T,Dynamic,Dynamic> | Eigen::Array<T,Dynamic,1> | Eigen::Array<T,1,Dynamic>
7  | blaze::DynamicVector<T,rowVector> | blaze::DynamicVector<T,colVector>
8  | blaze::DynamicVector<T,blaze::rowVector> | blaze::DynamicVector<T,blaze::colVector>
9  | blaze::DynamicMatrix<T,blaze::rowMajor> | blaze::DynamicMatrix<T,blaze::colMajor>
10  | itpp::Mat<T> | itpp::Vec<T>
11  | blitz::Array<T,1> | blitz::Array<T,2> | blitz::Array<T,3>
12  | dlib::Matrix<T> | dlib::Vector<T,1>
13  | ublas::matrix<T> | ublas::vector<T>
14 ptr := T*
15 accept := ref | ptr

In addition to the standard data types offered by BLAS/LAPACK systems and POD struct -s, std::vector also supports std::string data-types mapping N dimensional variable-length C like string HDF5 data-sets to std::vector<std::string> objects.

short examples:

to read/map a 10x5 matrix from a 3D array from location {3,4,1}

#include <armadillo>
#include <h5cpp/all>
...
auto fd = h5::open("some_file.h5", H5F_ACC_RDWR);
/* the RVO arma::Mat<double> object will have the size 10x5 filled*/
try {
/* will drop extents of unit dimension returns a 2D object */
auto M = h5::read<arma::mat>(fd,"path/to/object",
h5::offset{3,4,1}, h5::count{10,1,5}, h5::stride{3,1,1} ,h5::block{2,1,1} );
} catch (const std::runtime_error& ex ){
...
}
// fd closes underlying resource (raii idiom)

to write the entire matrix back to a different file

#include <Eigen/Dense>
#include <h5cpp/all>
h5::fd_t fd = h5::create("some_file.h5",H5F_ACC_TRUNC);
h5::write(fd,"/result",M);

to create an dataset recording a stream of struct into an extendable chunked dataset with GZIP level 9 compression:

#include <h5cpp/core>
#include "your_data_definition.h"
#include <h5cpp/io>
...
auto ds = h5::create<some_type>(fd,"bids", h5::max_dims{H5S_UNLIMITED}, h5::chunk{1000} | h5::gzip{9});

to append records to an HDF5 datastream

#include <h5cpp/core>
#include "your_data_definition.h"
#include <h5cpp/io>
auto fd = h5::create("NYSE high freq dataset.h5");
h5::pt_t pt = h5::create<ns::nyse_stock_quote>( fd,
"price_quotes/2018-01-05.qte",h5::max_dims{H5S_UNLIMITED}, h5::chunk{1024} | h5::gzip{9} );
quote_update_t qu;
bool having_a_good_day{true};
while( having_a_good_day ){
try{
recieve_data_from_udp_stream( qu )
h5::append(pt, qu);
} catch ( ... ){
if( cant_fix_connection() )
having_a_good_day = false;
}
}