h5extract not fully compatible with new Mass Processing merged format
Summary
New file format of the mass processing production using snakemake merges all neutrino types into a single file. The header of this file has less information than the individual files, in particular the field simul
is missing. This doesn't allow to write the w2list to the .h5 files when using h5extract, see class EventInfoTabulator
at https://git.km3net.de/km3py/km3pipe/-/blob/master/src/km3modules/io.py#L303, in line 313, the condition if "simul" in blob["header"].keys():
allows to include this information in the final file. This is information we would like to keep in the final .h5 files.
Describe a possible workaround to achieve the same functionality
The names of the new files have the following naming convention: KM3NeT_{detid}_{run}.mc.{generator}.{light}.{jterbr}.{recoA_recoB_positioning}.{version}.root
, one could make a dictionary for the definition of generator
to what is used by the variable sim_program
in the function _unfold_w2list(self, w2list, sim_program)
in https://git.km3net.de/km3py/km3pipe/-/blob/master/src/km3modules/io.py#L352 . Although I am not sure if the information of the name of the file can be accessed directly from the Blob
object. I am not sure if an easier option could be done from this side. Of course, the option could also be to keep the header information at the merging step of the files, I don't really know what option is the most suitable.