Data Structure draft

Last modified 1 years ago / Edit on Github
This post was updated more than 1 year ago, some information may be outdated!
This is a draft, the content is not complete and of poor quality!

Hierarchical Data Format (HDF)

  • Designed to store and organize large amounts of data.
  • Store multiple data files in a single data file!
    • Different types of information.
    • Self describing (metadata included in the file)
  • Properties[ref]:
    • Datasets (numpy arrays): fast slicing, compression.
    • Group (dictionaries): nesting, POSIX path syntax.
    • Attributrs (metadata): datasets/group, key-value.
  • HDF5 is row based and really effient than csv for very large file size[ref].
  • Extensions: .h5, .hdf, .hdf4, ...
  • Tool: HDFView
  • Example[ref]:

An example of HDF5 structure
An example HDF5 file structure which contains groups, datasets and associated metadata.

import h5py

f = h5py.File('mytestfile.hdf5', 'r') # read a file
# h5py.File acts like Python dict
dset = f['mydataset']
dset.attrs # attribute