Contents

HDF5

My hdf5 cheatsheet.

1
2
import h5py
import numpy as np

Create a file

1
f = h5py.File('demo.hdf5', 'w')
1
2
data = np.arange(10)
data
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1
f['array'] = data
1
dset = f['array']
1
dset
<HDF5 dataset "array": shape (10,), type "<i8">
1
dset[:]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
1
dset[[1, 2, 5]]
array([1, 2, 5])

Add additional data

1
f['dataset'] = data
1
f['full/dataset'] = data
1
list(f.keys())
['array', 'dataset', 'full']
1
grp = f['full']
1
'dataset' in grp
True
1
list(grp.keys())
['dataset']

Create dataset

1
dset = f.create_dataset('/full/bigger', (10000, 1000, 1000, 1000), compression='gzip')

Set attributes

1
dset.attrs
<Attributes of HDF5 object at 140618810188336>

Atributes again have dictionary structure, so can add attribute like so:

1
2
dset.attrs['sampling frequency'] = 'Every other week between 1 Jan 2001 and 7 Feb 2010'
dset.attrs['PI'] = 'Fabian'
1
2
3
list(dset.attrs.items())
for i in dset.attrs.items():
    print(i)
('PI', 'Fabian')
('sampling frequency', 'Every other week between 1 Jan 2001 and 7 Feb 2010')

Open file

1
f.close()
1
f = h5py.File('demo.hdf5', 'r')
1
list(f.keys())
['array', 'dataset', 'full']
1
dset = f['array']

hdf5 files are organised in a hierarchy - that’s what the “h” stands for.

1
dset.name
'/array'
1
root = f['/']
1
list(root.keys())
['array', 'dataset', 'full']
1
list(f['full'].keys())
['bigger', 'dataset']

Sources