Shuffling data in HDF5Pump
Summary
@mmoser would like to have the ability to shuffle groups when reading an HDF5 file with HDF5Pump
.
Something like this:
pipe.attach(HDF5Pump, filename='...', shuffle=True)
What is the current workaround to achieve the same functionality?
Reading the group_id
s from the /group_info
table in advance, shuffle that and create a one-to-one-map and then change the group_id
of each blob item with a separate module.