h5extractorcf not working properly with data and neutrino files

Hi @sreck, I just converted one ARCA6 data file with h5extractorcf, the exact command was:

h5extractf \
    -o ${DIR}/h5/${file}.h5 \
    --provenance-file=${DIR}/h5/provenance/provenance_${file} \
    ${DIR}/reco/${file}.root

Then, @ninavdmeulen converted it with OrcaSong with:

fg = FileGraph(max_n_hits=5000, extractor=get_real_data_info_extr(inputfile),det_file=detectorfile,keep_event_info = True)

and finally, Nina ran the inference on such file for an already-trained track/shower classifier. But she gets the following error:

WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (478) bind mounts
2022-06-10 11:53:55.473903: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 11:53:59.503297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30985 MB memory:  -> device: 0, name: NVIDIA Tesla V100-PCIE-32GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
2022-06-10 11:53:59.509574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30985 MB memory:  -> device: 1, name: NVIDIA Tesla V100-PCIE-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0
2022-06-10 11:53:59.511300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 30985 MB memory:  -> device: 2, name: NVIDIA Tesla V100-PCIE-32GB, pci bus id: 0000:af:00.0, compute capability: 7.0
Automatically set epoch to epoch 22 file 1.
Loading saved model: saved_models/model_epoch_22_file_1.h5
Number of GPUs: 3
Working on file ML_datav7.0.jchain.arca.aashower.00009635.h5
Creating temporary file /sps/km3net/users/nvanderm/ML_project/Output/ARCA6/ts/test/predictions/inference/temp_model_epoch_22_file_1_on_ML_datav7.0.jchain.arca.aashower.00009635.h5_10-06-2022-10-54-03
Predicting in step 0/5370 (0.00%)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/orcanet/lib/label_modifiers.py", line 274, in __call__
    particle_type = y_values["particle_type"]
ValueError: no field of name particle_type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/orcanet", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/orcanet/parser.py", line 251, in main
    func(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/orcanet/parser.py", line 85, in inference
    return orga.inference(epoch=epoch, fileno=fileno)
  File "/usr/local/lib/python3.8/dist-packages/orcanet/core.py", line 399, in inference
    return [filename for filename in gen]
  File "/usr/local/lib/python3.8/dist-packages/orcanet/core.py", line 399, in <listcomp>
    return [filename for filename in gen]
  File "/usr/local/lib/python3.8/dist-packages/orcanet/core.py", line 428, in _inference
    backend.h5_inference(
  File "/usr/local/lib/python3.8/dist-packages/orcanet/backend.py", line 234, in h5_inference
    info_blob = next(itergen)
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/data_utils.py", line 485, in __iter__
    for item in (self[i] for i in range(len(self))):
  File "/usr/local/lib/python3.8/dist-packages/keras/utils/data_utils.py", line 485, in <genexpr>
    for item in (self[i] for i in range(len(self))):
  File "/usr/local/lib/python3.8/dist-packages/orcanet/h5_generator.py", line 172, in __getitem__
    ys = self.label_modifier(info_blob)
  File "/usr/local/lib/python3.8/dist-packages/orcanet/lib/label_modifiers.py", line 277, in __call__
    if not self._warned:
AttributeError: 'TSClassifier' object has no attribute '_warned'
Exception ignored in: <function Pool.__del__ at 0x2b0900480040>
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 268, in __del__
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 362, in put
AttributeError: 'NoneType' object has no attribute 'dumps'

The file after h5extraction is:

/sps/km3net/repo/data_processing/tag/v6_ARCA6_run_by_run/prod/data/KM3NeT_00000075/v7.0/h5/datav7.0.jchain.arca.aashower.00009635.h5

and the one after OrcaSong conversion is:

/sps/km3net/repo/data_processing/tag/v6_ARCA6_run_by_run/prod/mc/atm_muon/KM3NeT_00000075/v6.3/h5/mcv6.3.mupage_100G.sirene.jterbr00009635.jchain.aashower.1.h5

It seems it looks for a MC variable which is not present in the data.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information