Heirarchical data format (HDF5)

HDF5 is a file format designed to store and organize large amounts of numerical data.
HDF5: API Specification

It is:
  • binary (so efficient)
  • handles multidimensional data
  • has a wide range of types, include complex and double precision
  • deals with endian-issues
  • stores data hierarchically, datasets (ie files) a kept in a tree of groups (ie directories)
  • stores metadata as attributes, attached to groups or to datasets.

HDF5 can be read and written by:
  • Mathematica (see below)
  • Matlab
  • Python (h5py), also PyTables (see below)
  • Labview - but see below.
  • Quite a lot of other languages which we don't use in the lab: perl, IDL,

Labview and HDF5

Labview has in the past used HDF5 as an internal format, but only written a limited and specific subset of it. General Labview support seems to be some time away. NI seems committed to their TDMS file format (proprietary, but well documented, doesn't support 2D arrays etc). This lavag post seems to indicate NI tried HDF5 and found performance issues, which are slightly worrying, but we are not likely to need to append >100 separate datastreams. It also discusses their commitment to TDMS and extending TDMS. This is disappointing given HDF5 and other open standards like XSIL.

The best available Labview library seems to be LVHDF5 - based on HDF5 1.6.5.

There is mailing list evidence that Tomi Maila is developing a library based on HDF5 1.8, which is more recent, but doesn't seem to be publicly released.

LVHDF5

Some limitations need to be investigated. In particular:

Arrays: "Only conversion of 1-D LabVIEW arrays is supported. Note that datasets may still be of higher dimensionality. Array datatypes are typically found only if contained by a cluster"

Not sure what this means - need to try it and see. It is of course always possible to flatten 2D to 1D and store, but highly undesirable.

Biggest problem appears to be very slow data conversion using strings

Install puts hdf5dll.dll, szlibdll.dll, and zlib1.dll in C:\Windows\SysWOW64.

Directly calling the HDF5 DLL from LabVIEW

Is now a development project documented at HDF5 Direct To LabVIEW.

GUIs to work with HDF5 files

There are also some nice GUI explorers such as ViTables (in python), HDF Explorer (windows only) and HDFView (java, cross-platform).

H5LT: Lightweight HDF5 interface

This lightweight C wrapper looks promising if we need to roll our own interface. There is an H5LT tutorial. It is particularly attractive because we don't have to fuss with #defines. A good example is that rather than saying:
H5LTread_dataset (file_id, dset_name, H5T_NATIVE_INT, data);
which relies on H5T_NATIVE_INT being set somewhere, likely in a header file that Labview doesn't know about, we can just as well say:
H5LTread_dataset_int (file_id, dset_name, data);
which should be trivially callable from labview. Similary, creating a (possibly multidimensional) dataset is as easy as:
H5LTmake_dataset_int (file_id, DSET3_NAME, rank, dims, data_int_in);
Note that DSET3_NAME is not a #define, it's a string constant.

There's also H5LTdtype_to_text which cheerfully converts opaque datatype enums to text strings, which something like labview can then handle in a fairly platform and library-revision independent way.

As an example of how relatively easy this makes things, the code below:
  • creates a new HDF file
  • writes in a 3x2 matrix containing numbers 1,2,3,4,5,6
  • closes the file
This takes three lines of actual code, as you'd hope!

#include "hdf5.h"
#include "hdf5_hl.h"

#define RANK 2

void main( void )
{
 hid_t       file_id;
 hsize_t     dims[RANK]={2,3};
 int         data[6]={1,2,3,4,5,6};
 herr_t      status;

 file_id = H5Fcreate ("ex_lite1.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);    // create a HDF5 file 
 status = H5LTmake_dataset_int(file_id, "/dset", RANK, dims, data);                  // create and write an integer type dataset named "dset" 
 status = H5Fclose (file_id);                                                // close file
}

Linking against hdf5dll.dll with MingW32

This is how to compile the above code with the MingW32-tdm compiler (see for example, [Building DLLs for LabVIEW for how this compiler is installed).

Add the line
#define _SSIZE_T_
before the #includes to stop sys/types.h arguing with the HDF5 includes about what ssize_t is. Urgh.

Then compile with
gcc -c ex_lite1.c -I"c:\Program Files\HDF5 1.8.6\include"

and link with
gcc -o ex_lite1.exe ex_lite1.o -L"c:\Program Files\HDF5 1.8.6\bin" -lhdf5dll -lhdf5_hldll
Info: resolving _H5T_NATIVE_INT_g by linking to __imp__H5T_NATIVE_INT_g (auto-import)
c:/mingw32/bin/../lib/gcc/mingw32/4.5.1/../../../../mingw32/bin/ld.exe: warning:
 auto-importing has been activated without --enable-auto-import specified on the
 command line.
This should work unless it involves constant data structures referencing symbols
 from auto-imported DLLs.

This whinging from the linker seems harmless, but I suppose would be nice to know exactly what is going on. The obvious settings of C_INCLUDE_PATH don't seem to let us get rid of the -I, nor does LIBRARY_PATH. Perhaps this is spaces in the filenames? Slashes the wrong way around? Meh, can fix with a Makefile if necessary.

This makes ex_lite1.exe, which cheerfully produces the example h5 file OK. Good.

Python interfaces

A direct Python implementation of the C API, appropriately objecty is h5py. There's no 64-bit version on the h5py project site, but you can get one here if necessary. ! PyTables is a different approach, see below.

HDF5 Tables

Tabular data with columns of differing types can be stored in HDF5. You construct a compound type (a struct in C), and then make an array of them. The struct variables are columns, the array member structs are rows. This is basically analogous to a single table in a relational database. Clearly, these tables are useful for storing multi-channel timeseries data amongst other things.

While tables can be created out of low level HDF5 library calls, this is tedious and various libraries have evolved. The official HDF5 H5TABLES interface is one. Pytables is another...

PyTables

PyTables may be a much easier solution for getting tabular data into and out of HDF5. It's very object oriented, and massively faster than we'll need.

Interestingly, unlike in an RDBMS, a column can contain not just atomic types like strings and numbers, but arrays or even other tables. This is the idea of a hierarchical table system. So arguably, multiple BECs in a single HDF5 (ie multiple shots with the same parameters) should be rows in a table, with the BEC images being columns, and everything else being rows in the table too. Hmm. Using paths is requires lexical names like "/bec1/images/absorption". OTOH, other tools accessing hdf5 will likely cope much better if the tables are fairly flat.

The underlying files are still HDF5, and I don't think it makes much use of metadata for its own purposes. So it shouldn't be hard to have Labview write in this format - well, it shouldn't be harder than having Labview write anything else in HDF5.

Very encouragingly, the detailed PyTables manual has this to say about interoperability with generic HDF:

!PyTables can access a wide range of objects in generic HDF5 files, like compound type datasets (that can be mapped to Table objects), homogeneous datasets (that can be mapped to Array objects) or variable length record datasets (that can be mapped to VLArray objects)._ Besides, if a dataset is not supported, it will be mapped to a special UnImplemented class (see Section 4.14), that will let the user see that the data is there, although it will be unreachable (still, you will be able to access the attributes and some metadata in the dataset). With that, PyTables probably can access and modify most of the HDF5 files out there.

ViTables

ViTables is a GUI for inspecting HDF5 files in general, particular aimed at fast access to large tabular data in PyTables format.

Installing it is slightly annoying. You need:
  • Fairly recent Python, including numpy > 1.4.0. I used EPD version 7.0.1.
  • PyQt4. Make sure you get the one for your version of Python! EPD-7 came with Python 2.7, so I got this one

The binary download of ViTables didn't work, so I built it from the repository. Doing hg clone http://hg.berlios.de/repos/vitables vitables_tip gets the code, and then the usual python setup.py install seemed to do the trick. It didn't like starting from cygwin, but was fine running from a cmd console as python vitables.

-- Main.LincolnTurner - 18 Feb 2011

Mathematica and HDF5

Mathematica speaks HDF5 but compund data structures are not supported (they are ignored by Import).

A basic package to read H5Tables in Mathematica is now more-or-less working. -- Main.LincolnTurner - 19 Mar 2011

Mathematica calls HDF5.exe in
<Mathematica install directory>\SystemFiles\Converters\Binaries\
which uses version 1.6.5 of the HDF5 library (in version 8 of Mathematica, at least). A very limited subset of the HDF5 functionality is exposed, in addition to the above problems.

-- Main.LincolnTurner - 07 Mar 2011
Topic revision: r11 - 08 Jul 2013, UnknownUser
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback