HDF5

2021-01-13 233 words 2 minutes

Contents

My hdf5 cheatsheet.

1
2
import h5py
import numpy as np

Create a file

1
f = h5py.File('demo.hdf5', 'w')

1
2
data = np.arange(10)
data

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1
f['array'] = data

1
dset = f['array']

1
dset

<HDF5 dataset "array": shape (10,), type "<i8">

1
dset[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1
dset[[1, 2, 5]]

array([1, 2, 5])

Add additional data

1
f['dataset'] = data

1
f['full/dataset'] = data

1
list(f.keys())

['array', 'dataset', 'full']

1
grp = f['full']

1
'dataset' in grp

True

1
list(grp.keys())

['dataset']

Create dataset

1
dset = f.create_dataset('/full/bigger', (10000, 1000, 1000, 1000), compression='gzip')

1
dset.attrs

<Attributes of HDF5 object at 140618810188336>

Atributes again have dictionary structure, so can add attribute like so:

1
2
dset.attrs['sampling frequency'] = 'Every other week between 1 Jan 2001 and 7 Feb 2010'
dset.attrs['PI'] = 'Fabian'

1
2
3
list(dset.attrs.items())
for i in dset.attrs.items():
    print(i)

('PI', 'Fabian')
('sampling frequency', 'Every other week between 1 Jan 2001 and 7 Feb 2010')

1
f.close()

1
f = h5py.File('demo.hdf5', 'r')

1
list(f.keys())

['array', 'dataset', 'full']

1
dset = f['array']

hdf5 files are organised in a hierarchy - that’s what the “h” stands for.

1
dset.name

'/array'

1
root = f['/']

1
list(root.keys())

['array', 'dataset', 'full']

1
list(f['full'].keys())

['bigger', 'dataset']