I seems to me that right now km3io saves trk.E as reconstructed energy, however we need JENERGY_ENERGY (the uncorrected energy), since the energy correction for e.g. ARCA2 is not properly working. I attach a reco-level plot for ARCA2 where I plot tracks.energy. You can see the "bad stuff" happening at the very low energies. I have encountered this before and it is definitely due to the bad energy correction.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
>>> import km3pipe as kp>>> from km3pipe.dataclasses import Table>>> import numpy as np>>> from glob import glob>>> import os.path>>> import km3io as ki>>> import h5py>>> >>> DET = '00000014'>>> PRIM = 'Fe'>>> #FOLDER = '/sps/km3net/users/kakiczi/CORSIKA_checks/SIBYLL-2.3/gamma_-1/' # Lyon... FOLDER = '/mnt/home/pkalaczynski/CORSIKA_production/SIBYLL_2.3/charm/gamma_-1/' # CIŚ>>> #FOLDERS = '/sps/km3net/users/kakiczi/CORSIKA_checks/SIBYLL-2.3/gamma_-1/*/gSeaGen_processed/mc/manual/KM3NeT_'+DET+'/v5.8/reco'... FOLDERS = FOLDER+PRIM+'/gSeaGen_processed/mc/manual/KM3NeT_'+DET+'/v5.8/reco' # CIŚ>>> FNAMES = glob(os.path.join(FOLDERS, "*aanet*root")) # " + i + ">>> FNAMES.sort() # ensures that we go in alphabetical/numeric order>>> >>> VARS = [[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],]>>> for f in FNAMES:... print("Opening ", f)... if len(ki.OfflineReader(f)):... for r in ki.OfflineReader(f):... if r.tracks.E.size: # only taking non-empty tracks... if r.tracks.rec_stages[0]==[1,2,3,4,5]: # only fully-reconstructed... print("corrected: ", r.tracks.E[0])... print("reco keys: ", r.tracks.reco.dtype.names)... print("reco: ", r.tracks.reco)... Opening /mnt/home/pkalaczynski/CORSIKA_production/SIBYLL_2.3/charm/gamma_-1/Fe/gSeaGen_processed/mc/manual/KM3NeT_00000014/v5.8/reco/mcv5.8.DAT000001.gSeaGen.sirene.jte.jchain.aanet.1.rootcorrected: 245013.2211937742Traceback (most recent call last): File "<stdin>", line 8, in <module> File "/mnt/home/pkalaczynski/miniconda3/envs/km3net/lib/python3.7/site-packages/km3io/offline.py", line 608, in reco self._reco = np.core.records.fromarrays(fit_data.transpose(), names=keys) File "/mnt/home/pkalaczynski/miniconda3/envs/km3net/lib/python3.7/site-packages/numpy/core/records.py", line 592, in fromarrays shape = arrayList[0].shapeIndexError: list index out of range
Yeah, we already discussed via rocketchat. It's indeed from empty tracks. It's weird, cuz they are sometimes just all 0 and sometimes empty lists.
Well, that's indeed not a MWE, but still it's rather clear what's going on I think. Perhaps I could optimize the code, but am quite lazy and not sure if it will be really faster if I use np.mask here
As you see, I use the .counts attribute to select those where at least one entry is present, then I use it as a mask for the first dimension (and 0 for the second to select the first track).
Trust me, it can be orders of magnitudes faster, it takes just a second to filter and extracts some parameters as numpy arrays (I am doing RBR analysis and go full whack with numpy) of hundreds of thousands of events.
Different sized sub arrays are no problem, uproot uses awkward arrays which are just specialized numpy arrays. You work with them as they were numpy arrays.
OK, I just used our aanet file in the samples folder of km3io. It seems that this doubly-nested thing is a bit "awkward" to deal with. uproot is currently waiting for awkwardarray 1.0 which is a complete rewrite of the current awkward library and has full numba support.
With numba, it's a nobrainer to write a dead simple for loop and make it ultra fast.
For now, I think I'll just ask the awkward guys how to deal with this specific problem efficiently. There must be a way.
Not sure if they will address it soon, as they are working on awkward 1.0 which will be completely reimplemented, but let's see. Maybe we are doing something wrong
Alright. I'll stick with the ugly implementation for now then. It's not too bad to slow me down seriously or anything. When the fix is available, I'll upgrade.
get fit data of the best-reconstructed track, best being defined as the tracks with the longest rec_stages (of course this may not always be [1, 2, 3, 4, 5]!!!!)
Please do not confuse both
you can see that best_E.size is 45 and E.size is 41.
so this means that in your file (which has 50 events) there are: 45 reconstructed tracks and 5 empty (non-reconstructed) events. in these 45 reconstructed tracks, there are 41 that are fully reconstructed (with rec_stage = [1, 2, 3, 4, 5]).
More information is available in the docs + README.
This looks cool, however I would then need some way to also extract mc_tracks, events, hits for corresponding fully-reconstructed events (otherwise arrays won't match in lenght).
I think I like it. The question is of course how performant it is, but I guess for now the biggest thing is that we have a fully working version and a test suite with full coverage. We can make it faster later