Extracting events for corresponding fully-reconstructed events

@zaly done ;)

mentioned in merge request !15 (merged)

Thank you @pkalaczynski for reporting this.

New methods were implemented in km3io (currently in the branch add-reco-hits-events), but will soon be merged to the master branch.

The new methods are: get_reco_hits, get_reco_tracks and get_reco_events. These three methods can be used in the exact same way, therefore I will only present one here:

>>> import km3io as ki

>>> r = ki.OfflineReader(myfile)

>>> r.get_reco_tracks([1, 2, 3, 4, 5], ["pos_x", "pos_y"])

{'pos_x': array([-647.39638136,  448.98490051,  451.12336854,  174.23666051,207.24223984, -460.75770881, -522.58197621,  324.16230509,
            -436.2319534 ]),
'pos_y': array([-138.62068609,   77.58887593,  251.08805881, -114.60614519, 143.61947974,   86.85012087, -263.14983599, -203.14263572,
             467.75113594])}

Please note that the list of keys you provide to the get_reco_tracks MUST correspond to the tracks class attributes.

For more details, please refer to the README.

closed

reopened

@pkalaczynski could you please explain why the issue is reopened?

Sorry, for late reply, but I have to reopen. get_mc_tracks is missing

oh ok! I forgot the mc classes! sorry! I will implement it right now and provite it to you. Give me 30 min ;)

By this I mean mc_tracks corresponding to reconstructed tracks

No worries ;)

@pkalaczynski

Could you please provide a tiny file with some data in mc_hits class and mc_tracks class, so that I can test my code.

Thanks

@pkalaczynski

now you can get your mc information as well with the same methods as before by simply adding the optional parameter mc=True.

for example, if you need the mc_tracks information you would type:

r.get_reco_tracks([1, 2, 3, 4, 5], ["pos_x", "pos_y"], mc=True)

this will actually look for the reconstruction stages in the mc tracks tree and extract the pox_x and pos_y from the mc tracks tree as well.

Please let me know if this corresponds to what you requested (I tested all the mc data with a file full of zeros, so I am curious to hear back from you). Please also share with me a tiny file with some data (non zero data in mc trees) so that I can make further tests if needed.

closed

reopened

@pkalaczynski

Could you please provide the following:

an example of your code that raised the error?
the path to your file?
a detailed description of what you would like to extract from your file.

Thanks for your contribution :)

If I understand correctly, you just want the index mask if the events of the best reconstructed tracks. With that mask you can select hits, events or whatever, matching the best tracks.

that's what's currently implemented in get_best_XX methods ...

I also would like to understand why it is necessary to link mc_tracks to tracks data in the analysis?

Code: (file path included in code)

import km3pipe as kp
from km3pipe.dataclasses import Table
import numpy as np
from glob import glob
import os.path
import km3io as ki
import h5py

f = '/sps/km3net/users/kakiczi/mcv5.8.DAT005000.gSeaGen.sirene.jte.jchain.aanet.5000.root' 
r=ki.OfflineReader(f) # 1,2,3,4,5 -> fully - reconstructed
reco_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["pos_x", "pos_y", "pos_z","dir_x", "dir_y", "dir_z","E"], mc=False)
mc_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["type", "E"], mc=True)

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pbs/home/k/kakiczi/.pyenv/versions/3.8.0/lib/python3.8/site-packages/km3io/offline.py", line 739, in get_reco_tracks
    raise ValueError(
ValueError: The stages [1, 2, 3, 4, 5] are not found in your file.

It works with mc=False, but not with mc=True. I guess it tries to find the reco stages inside the mc_tracks, where they are obviously not to be found. I think the only solution is the index matching between mc_tracks and reco_tracks (though I may be wrong).

Yep, I want pretty much exactly what you wrote @tgal.

@zaly I guess an example will be most useful here. In my script I wanna do this:

VARS[0].extend(mc_tracks['type']) # primary
VARS[1].extend(sum(reco_hits['tot']))
VARS[2].extend(sum(reco_hits['trig']))
VARS[3].extend(len(reco_hits['trig']))
VARS[4].extend(reco_events['overlays'])
VARS[5].extend(reco_events['w'][:,2]) # (hopefully) w3 weights
VARS[6].extend(len(mc_tracks['type'])-1) # multiplicity (-1 because of the primary)
VARS[7].extend(reco_tracks['pos_x'][:,0])
VARS[8].extend(reco_tracks['pos_y'][:,0])
VARS[9].extend(reco_tracks['pos_z'][:,0])
VARS[10].extend(reco_tracks['dir_x'][:,0])
VARS[11].extend(reco_tracks['dir_y'][:,0])
VARS[12].extend(reco_tracks['dir_z'][:,0])
VARS[13].extend(reco_tracks['E'][:,0]) # energy
VARS[14].extend(mc_tracks['E'][:,0]) # primary energy
VARS[15].extend(np.ones(len(reco_tracks['pos_x']))*((int)(r.header['norma'].split()[1]))*len(FNAMES))

So as you can see I need the corresponding info from mc_tracks, reco_tracks and reco_events

In this particular case I want to get the multiplicity, primary type and primary energy from the true mc info and I want it ONLY for the reconstructed events.

@pkalaczynski

So ok now I understand what you want. In fact, what's implemented now treats mc_tracks class and tracks class as two completely separate words! So when you use mc=True it really just looks for mc_tracks data and ignores the other tracks class! So since there are no rec_stages in the mc_track, the error is raised!

km3pipe ❯ ipython
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import km3io as ki                                                                                           

In [2]: r = ki.OfflineReader("/home/zineb/km3net/temp/mcv5.8.DAT005000.gSeaGen.sirene.jte.jchain.aanet.5000.root")   

In [3]: reco_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["pos_x", "pos_y", "pos_z","dir_x", "dir_y", "di
   ...: r_z","E"], mc=False)                                                                                         

In [4]: reco_tracks                                                                                                  
Out[4]: 
{'pos_x': array([177.84781792, 207.27802553, 189.1657712 , 133.79471471,
        254.42190319, 183.81533394, 205.54856171, 219.62538429,
        305.50516418, 192.98440688, 134.50912383,   0.33646153,
        187.24709216, 170.25124195, 175.12584118, 218.47878668,
        146.72331885, -15.03363138,  53.6795057 , -14.07655192,
        246.90569737, 231.61690933, 270.56147897, 113.51247141,
        241.38530136, 175.2704682 , 151.25265581,  76.54592073]),
 'pos_y': array([343.0976011 , 405.28678653, 218.54893561, 342.1424884 ,
         88.82025319, 215.20820978, 166.44164179, 214.56306312,
        251.26863742, 172.11843674, 291.38261377, 354.17269736,
        106.98805098, -36.16731277, 139.97416489,  89.70604746,
        118.91217091,   5.7818432 , 241.55424906,  55.9568665 ,
        130.31828636,  75.22846364,  58.51461692, 115.67916548,
        272.98320989, 144.27619142, 254.21158207, 271.59754644]),
 'pos_z': array([323.5258727 , 523.70964183, 734.9092803 , 625.3592691 ,
        644.39361084, 703.86352302, 530.10502057, 515.20415013,
        289.85077458, 748.39594442, 414.16086465, 514.48329745,
        716.05045732, 742.43283063, 597.23750138, 612.56798952,
        556.52943487, 731.56929799, 449.21912695, 378.47751453,
        808.99123053, 548.07772433, 580.95850975, 195.57989174,
        257.2383528 , 744.30486231, 710.8698719 , 540.12141305]),
 'dir_x': array([-0.36073691, -0.23487029, -0.21492644, -0.03270867, -0.68090838,
        -0.51889295, -0.4850477 , -0.67364951, -0.5586593 , -0.26528346,
         0.27095708,  0.49022243, -0.54394489, -0.17785976, -0.96426854,
        -0.8555766 , -0.26856206,  0.29564179,  0.63278235,  0.6111149 ,
        -0.23625814, -0.97065121, -0.60206788, -0.18022527, -0.20073554,
        -0.76769837,  0.36879147,  0.15217303]),
 'dir_y': array([-0.88945128, -0.76391132,  0.01614322, -0.86778236,  0.60992083,
        -0.24459038,  0.41009707, -0.13150787, -0.07171977, -0.28545098,
        -0.58334614, -0.77151642,  0.41235937,  0.47331912, -0.101016  ,
         0.44384211,  0.6688397 ,  0.58012447, -0.16579897,  0.24031641,
        -0.18406053,  0.2386617 ,  0.04699138,  0.22802451, -0.48198237,
         0.09124815, -0.56819112, -0.59891836]),
 'dir_z': array([ 0.28061594, -0.60106193, -0.97649681, -0.49586683, -0.40541381,
        -0.81910052, -0.77236593, -0.7272565 , -0.82629054, -0.92094648,
        -0.76569546, -0.40551742, -0.73081031, -0.86274847, -0.24491213,
        -0.26644486, -0.69320118, -0.75898059, -0.75637108, -0.75417942,
        -0.95409843, -0.02961111, -0.7970609 , -0.95683002, -0.85287645,
        -0.63428148, -0.73563013, -0.78621891]),
 'E': array([2.31514573e+04, 6.90207045e+05, 2.18221522e+05, 1.84145729e+05,
        8.35673684e+04, 6.49944570e+04, 5.81782489e+03, 6.96332634e+05,
        5.83543507e+05, 1.72947614e+05, 2.41106273e+04, 4.93355367e+05,
        1.84145729e+05, 2.58442747e+05, 5.23175271e+03, 1.84672833e+04,
        9.03473577e+03, 1.55684041e+06, 5.53695711e+04, 3.25420998e+04,
        2.72595412e+05, 5.81782489e+03, 1.04192434e+05, 9.46727091e-01,
        6.31895511e+04, 7.47338093e+05, 4.48795292e+04, 2.18221522e+05])}

In [5]: mc_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["type", "E"], mc=True)                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-ca5c5d0d7316> in <module>
----> 1 mc_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["type", "E"], mc=True)

~/km3net/km3net/km3io/km3io/offline.py in get_reco_tracks(self, stages, keys, mc)
    739             raise ValueError(
    740                 "The stages {} are not found in your file.".format(
--> 741                     str(stages)))
    742         else:
    743             for key in keys:

ValueError: The stages [1, 2, 3, 4, 5] are not found in your file.

In [6]: r.mc_tracks.rec_stages                                                                                       
Out[6]: <ChunkedArray [[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] ... [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []] [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]] at 0x7f06df0ee9d0>

anyway, I will provide this in a different branch for you. I will keep the master branch as it is for the moment because @tgal and I believe that this needs more thinking (the current implementation is a temporary/transitory solution).

Sure, I'm fine with anything that works

We always face the same issue with this aanet stuff. One class for everything... mc_tracks should not have rec_stages, but since in aanet it's the same class, you cannot do much about it. However, we can hack it...

I am looking forward to https://github.com/scikit-hep/awkward-1.0 which we could use to create these structures.

@pkalaczynski

What you asked for is now in the branch called piotr-selection.

So now the reconstruction stages are ONLY looked for in the tracks tree (not mc_tracks anymore). And when you type mc=True in any get_reco_XX method, it should look in the same index of events as those found with the rec_stages=[1, 2, 3, 4, 5] in the tracks tree!

~/km3net/km3net/km3io piotr-selection*
km3pipe ❯ ipython
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import km3io as ki                                                                                           
^[[A
In [2]: r = ki.OfflineReader("/home/zineb/km3net/temp/mcv5.8.DAT005000.gSeaGen.sirene.jte.jchain.aanet.5000.root")   

In [3]: mc_tracks = r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["type", "E"], mc=True)                           

In [4]: mc_tracks                                                                                                    
Out[4]: 
{'type': array([14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14,
        14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14], dtype=int32),
 'E': array([4.252786e+07, 6.617208e+08, 8.051126e+06, 3.395941e+08,
        2.484416e+08, 1.823304e+08, 1.228110e+07, 4.646829e+08,
        8.994611e+08, 2.012423e+08, 6.452572e+08, 3.866465e+08,
        7.675982e+07, 5.640656e+07, 1.032356e+08, 1.149066e+08,
        1.962799e+08, 6.439778e+08, 2.773774e+07, 1.162574e+08,
        8.991505e+08, 2.559056e+07, 1.635778e+08, 2.256937e+07,
        4.172996e+08, 5.583680e+08, 8.502509e+06, 5.496416e+07])}

I hope this helps! Good luck for your analysis

Btw. a future notice, I think we can provide the same naming convention is used in many other low level libraries: bestreco(tracks) and argbestreco(tracks). The first gives the data, the second only the mask etc.

Sure! I suggest we discuss these details in a phone call after your code review, or on the issue #31 (closed) . I am happy to add any naming conventions you think is necessary ;)

Absolutely. We need to combine our minds to tackle this...

Thanks @zaly , it works. However, it still does not provide all the functionality I need. Since I have an ungly (but working) alternative, I would say it does not make sense for you to try to just write another bunch of functions just for me. You better focus on the general solution.

As @tgal said, a mask that would only show entries corresponding to the fully-recoed ones would be best. Right now I cannot e.g. extract multiplicities from r.get_reco_tracks(stages=[1, 2, 3, 4, 5], keys=["type", "E"], mc=True), since I get 1 entry per event (and multiplicity is not a standard key in offline files).

This might happen again if someone wants to compute something different than multiplicity, so I think we really need a general solution.

@pkalaczynski

Thanks for reporting back.

@tgal and I are aware of this, I only wanted to help you make progress on your work...

km3io is still under development so this is absolutely not the final version, but a temporary solution to help people make progress in their work. The idea behind the get_reco_XX methods is to select the tracks with a specific reco stages and the highest likelihood. In fact, if you need multiplicity then get_reco_XX are not suited for that!

I am on holidays for a week so I can not promise to provide anything now. But we (with @tgal ) will combine efforts to have a more robust solution for this.

Thanks again for your contribution! Sorry but I really did my best to try to help you before going on holidays.

This issue will stay open until a better solution is available.

I appreciate that, but I see it's jsut getting too complicated this way.

Yep, they are not. Don't feel too bad, this stuff is just messy and annoying and I'm quite fine with what I have now. I can generate hdf5's with the variables I need, this would only be a performance (and script attractiveness let's say ;) ) boost for me.

I really appreciate what you guys do with km3io, it has already made my life a lot easier!

Glad to hear :)

I'll work on this the next days!

@tgal

This issue is directly related to the tracks slicing issue #15 (closed) as well. @pkalaczynski issue initiated the work on a new API design for offline module. Since there is no further work that will be done with the current offline design to address Piotr's issue about tracks slicing, I suggest we close it and keep issue #15 (closed) open to focus on the new API design of offline module. I do not think that a quick fix can be provided with the current offline design anyway! Meanwhile, users can use the available methods to by-pass the issues of slicing while waiting for the official release of the new design.

I will let you close the issue if you agree.

Sure, it's ok.

closed

Yes I agree, now that we have more real use cases it's easier to adapt the API to match those.

Extracting events for corresponding fully-reconstructed events

Designs

Child items ...

Activity