I have some promising news. The 58-uproot4-integration of km3io now implements a half-way-done uproot4 usage. Luckily the h5extract fully works with that version and the conversion happens in ~30minutes for a huge file with 143k events:
143381 cycles drained in 31.390205min (CPU 31.185114min). Memory peak: 3819.62 MBStatistics are based on the last 100000 cycles.
Ah, sorry for the spam, I also wanted to ping @vpestel
So as you can see, a full conversion (reading ROOT and writing HDF5) of all the supported information (at this moment, basically everything) takes half an hour.
Without writing HDF5 (just parsing and creating all the Tables in memory) takes 8 minutes for the whole file.
Just looping over the ROOT file in the Pipeline takes 45 seconds
Here is the full list of the timeit statistics for the whole processing and converting. Most of the time is spent in the RecoTableTabulator and the HDF5Sink. The 4GB memory peak is a bit annoying, but it seems it's coming from uproot4 itself, I am investigating.
Alright, finished. Let's merge to master, tag version 0.18 and wait for your complaints about broken scripts
I tried to stick to the existing API and avoided to break it, but there are a few tiny changes here and there. Especially the usr stuff is now different but it now has a massive performance improevement.
Instead of
f.events.usr.RecoQuality
you now use the free function km3io.tools.usr(obj, attribute). It sounds like a step back but it's actually a much better way and I have full control over the optimisations behind the scenes. So this is how it looks like now:
>>>km3io.tools.usr(f.events,"RecoQuality")<Array[85.5,68.7,50.2]type='3 * float64'># or simply (since `f` is now the same as `f.events`)>>>km3io.tools.usr(f,"RecoQuality")<Array[85.5,68.7,50.2]type='3 * float64'>
1
Tamas Galmarked the checklist item best track methods to work with the new uproot4 structures as completed
marked the checklist item best track methods to work with the new uproot4 structures as completed
Tamas Galmarked the checklist item fancy indexing on events/hits/tracks as completed
marked the checklist item fancy indexing on events/hits/tracks as completed
Tamas Galmarked the checklist item all the other tools from @zaly in tools.py as completed
marked the checklist item all the other tools from @zaly in tools.py as completed