Unintended casting
I was testing reading two files after each other and extracting some values, I got this huge error:
2020-06-23 17:06:47 CRITICAL ++ km3pipe.dataclasses: dtype mismatch! Matching field names but differing field types, no chance to reorder.
dtype of data: (numpy.record, [('Erange_max', '<i8'), ('Erange_min', '<i8'), ('JENERGY_CHI2', '<f8'), ('JENERGY_ENERGY', '<f8'), ('JENERGY_MUON_RANGE_METRES', '<f8'), ('JENERGY_NDF', '<f8'), ('JENERGY_NOISE_LIKELIHOOD', '<f8'), ('JENERGY_NUMBER_OF_HITS', '<f8'), ('JGANDALF_BETA0_RAD', '<f8'), ('JGANDALF_BETA1_RAD', '<f8'), ('JGANDALF_CHI2', '<f8'), ('JGANDALF_LAMBDA', '<f8'), ('JGANDALF_NUMBER_OF_HITS', '<f8'), ('JGANDALF_NUMBER_OF_ITERATIONS', '<f8'), ('JSTART_LENGTH_METRES', '<f8'), ('JSTART_NPE_MIP', '<f8'), ('JSTART_NPE_MIP_TOTAL', '<f8'), ('JVETO_NPE', '<f8'), ('JVETO_NUMBER_OF_HITS', '<f8'), ('W2LIST_GSEAGEN_BX', '<f8'), ('W2LIST_GSEAGEN_BY', '<f8'), ('W2LIST_GSEAGEN_CC', '<f8'), ('W2LIST_GSEAGEN_COLUMN_DEPTH', '<f8'), ('W2LIST_GSEAGEN_EG', '<f8'), ('W2LIST_GSEAGEN_ICHAN', '<f8'), ('W2LIST_GSEAGEN_PS', '<f8'), ('W2LIST_GSEAGEN_P_EARTH', '<f8'), ('W2LIST_GSEAGEN_P_SCALE', '<f8'), ('W2LIST_GSEAGEN_WATER_INT_LEN', '<f8'), ('W2LIST_GSEAGEN_XSEC_MEAN', '<f8'), ('dir_x', '<f8'), ('dir_y', '<f8'), ('dir_z', '<f8'), ('energy', '<f8'), ('gandalf_best_chi2_red', '<f8'), ('gandalf_dir_x', '<f8'), ('gandalf_dir_y', '<f8'), ('gandalf_dir_z', '<f8'), ('gandalf_energy', '<f8'), ('gandalf_is_good', '<f8'), ('gandalf_likelihood', '<f8'), ('gandalf_pos_x', '<f8'), ('gandalf_pos_y', '<f8'), ('gandalf_pos_z', '<f8'), ('is_cc', '<i8'), ('is_neutrino', '<i8'), ('jsh_dir_x', '<f8'), ('jsh_dir_y', '<f8'), ('jsh_dir_z', '<f8'), ('jsh_energy', '<f8'), ('jsh_is_good', '<f8'), ('jsh_likelihood', '<f8'), ('jsh_pos_x', '<f8'), ('jsh_pos_y', '<f8'), ('jsh_pos_z', '<f8'), ('livetime_sec', '<f8'), ('mc_id', '<i4'), ('n_events_gen', '<f8'), ('pos_x', '<f8'), ('pos_y', '<f8'), ('pos_z', '<f8'), ('run_from_header', '<i8'), ('run_id', '<i4'), ('type', '<i4'), ('w1', '<f8'), ('w2', '<f8'), ('w3', '<f8'), ('weight_one_year', '<i8'), ('group_id', '<i8')])
requested dtype: [('Erange_max', '<i8'), ('Erange_min', '<i8'), ('JENERGY_CHI2', '<f8'), ('JENERGY_ENERGY', '<f8'), ('JENERGY_MUON_RANGE_METRES', '<f8'), ('JENERGY_NDF', '<f8'), ('JENERGY_NOISE_LIKELIHOOD', '<f8'), ('JENERGY_NUMBER_OF_HITS', '<f8'), ('JGANDALF_BETA0_RAD', '<f8'), ('JGANDALF_BETA1_RAD', '<f8'), ('JGANDALF_CHI2', '<f8'), ('JGANDALF_LAMBDA', '<f8'), ('JGANDALF_NUMBER_OF_HITS', '<f8'), ('JGANDALF_NUMBER_OF_ITERATIONS', '<f8'), ('JSTART_LENGTH_METRES', '<f8'), ('JSTART_NPE_MIP', '<f8'), ('JSTART_NPE_MIP_TOTAL', '<f8'), ('JVETO_NPE', '<f8'), ('JVETO_NUMBER_OF_HITS', '<f8'), ('W2LIST_GSEAGEN_BX', '<f8'), ('W2LIST_GSEAGEN_BY', '<f8'), ('W2LIST_GSEAGEN_CC', '<f8'), ('W2LIST_GSEAGEN_COLUMN_DEPTH', '<f8'), ('W2LIST_GSEAGEN_EG', '<f8'), ('W2LIST_GSEAGEN_ICHAN', '<f8'), ('W2LIST_GSEAGEN_PS', '<f8'), ('W2LIST_GSEAGEN_P_EARTH', '<f8'), ('W2LIST_GSEAGEN_P_SCALE', '<f8'), ('W2LIST_GSEAGEN_WATER_INT_LEN', '<f8'), ('W2LIST_GSEAGEN_XSEC_MEAN', '<f8'), ('dir_x', '<f8'), ('dir_y', '<f8'), ('dir_z', '<f8'), ('energy', '<f8'), ('gandalf_best_chi2_red', '<f8'), ('gandalf_dir_x', '<f8'), ('gandalf_dir_y', '<f8'), ('gandalf_dir_z', '<f8'), ('gandalf_energy', '<f8'), ('gandalf_is_good', '<f8'), ('gandalf_likelihood', '<f8'), ('gandalf_pos_x', '<f8'), ('gandalf_pos_y', '<f8'), ('gandalf_pos_z', '<f8'), ('is_cc', '<i8'), ('is_neutrino', '<i8'), ('jsh_dir_x', '<f8'), ('jsh_dir_y', '<f8'), ('jsh_dir_z', '<f8'), ('jsh_energy', '<f8'), ('jsh_is_good', '<f8'), ('jsh_likelihood', '<f8'), ('jsh_pos_x', '<f8'), ('jsh_pos_y', '<f8'), ('jsh_pos_z', '<f8'), ('livetime_sec', '<f8'), ('mc_id', '<i4'), ('n_events_gen', '<i8'), ('pos_x', '<f8'), ('pos_y', '<f8'), ('pos_z', '<f8'), ('run_from_header', '<i8'), ('run_id', '<i4'), ('type', '<i4'), ('w1', '<f8'), ('w2', '<f8'), ('w3', '<f8'), ('weight_one_year', '<i8'), ('group_id', '<i8')]
2020-06-23 17:06:47 CRITICAL ++ km3pipe.io.hdf5.HDF5Sink.HDF5Sink: Cannot write a table to '/summary' since its dtype is different compared to the previous table with the same HDF5 location, which was used to fix the dtype of the HDF5 compund type.
Traceback (most recent call last):
File "pipe.py", line 44, in <module>
pipe.drain()
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/thepipe/core.py", line 423, in drain
return self._drain(cycles)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/thepipe/core.py", line 372, in _drain
new_blob = module(blob_to_send)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/thepipe/core.py", line 178, in __call__
return self.process(*args, **kwargs)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/km3pipe/io/hdf5.py", line 487, in process
data = self._process_entry(key, entry)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/km3pipe/io/hdf5.py", line 474, in _process_entry
self._write_table(entry.h5loc, entry, title=title)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/km3pipe/io/hdf5.py", line 396, in _write_table
arr = Table(arr, dtype=tab.dtype)
File "/project/antares/public_student_software/venvs/km3pipe-v9-alpha14/lib/python3.7/site-packages/km3pipe/dataclasses.py", line 175, in __new__
raise ValueError("dtype mismatch")
ValueError: dtype mismatch
Closing remaining open files:test2.h5...done
which basically boils down to:
dtype of data: ('n_events_gen', '<f8')
requested dtype: ('n_events_gen', '<i8')
In the original GSG files the values are:
file1: genvol [...] 8.00E+05
file2: genvol [...] 7.00E+06
But when opening aanet files with km3io:
f.header.genvol.numberOfEvents
800000
for file1 and
f.header.genvol.numberOfEvents
7000000.0
for file2.
So indeed for one of the files this value is cast to float.
I solved this by casting in my own km3pipe Module:
blob.summary["n_events_gen"] = int(self.n_events_gen)
But I believe this casting to float is not supposed to happen since both files were made with the same versions of GSG, JPP, aanet, etc.