Support for aanet 'usr' fields in OfflineReader

You mean the usr field of the event file itself?

The usr stuff from tracks and MC hits are accessible.

Btw. I really dislike this usr "hack" we did in the aanet format.

Thanks Tom. I want to access what in aanet one would access evt.getusr(key). In this case it's a set of custom variables used by the classifier in the online event selection.

I am not sure I understand the usr stuff from tracks and MC hits definition, sorry for my naivety.

In general, I agree we should not encourage proliferation of used-defined fields without some discussion on if/how to support them. On the other hand there are a few situations in which they are used as a temporary or intermediate solution before a definitive data format is defined.

Can you point me to a file which has some values in the usr thing?

/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root

One of the keys for the custom fields is DeltaPosZ.

Thanks for looking into this!

added awaiting feedback usability labels

Alright, it seems that uproot fails to interpret it automatically. This is basically true for everything which comes from aanet The Jpp stuff works 95% of the time...

Anyways, I extracted the data and now need to find out where this DeltaPosZ is hidden:

In [42]: f = uproot.open("/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root")                                                                                         

In [43]: f['E']['Evt']['usr'].show()                                                                                                                                                          
usr                        TStreamerSTL               asjagged(asdtype('>f8'), 10)

In [44]: f['E']['Evt']['usr'].array(uproot.asdebug)                                                                                                                                           
Out[44]: <JaggedArray [[64 0 0 ... 172 204 139] [64 0 0 ... 148 91 111] [64 0 0 ... 159 29 165] ... [64 0 0 ... 76 108 41] [64 0 0 ... 146 53 237] [64 0 0 ... 169 235 8]] at 0x7f019cf4e410>

In [45]: f['E']['Evt']['usr'].array(uproot.asdebug)[0]                                                                                                                                        
Out[45]: 
array([ 64,   0,   0, 142,   0,   9,   0,   0,   0,  17,  64,  85,  93,
       105, 162,  46,  82,  50,  64,  66, 128,   0,   0,   0,   0,   0,
        64,  93, 168,  86, 136,  91, 178, 115,  64, 137, 200,   0,   0,
         0,   0,   0,  64, 102,   0,   0,   0,   0,   0,   0,  64, 132,
        72,   0,   0,   0,   0,   0,  63, 203,  78, 129, 180, 232,  27,
        79,  64,  66, 194, 132, 204, 220, 117,  42,  64,  96, 233, 112,
       157, 178, 248, 146,  64,  88, 113, 158, 212, 247, 182, 143,  64,
        73, 128,   0,   0,   0,   0,   0,  64,  62,   0,   0,   0,   0,
         0,   0,  64,  28,   0,   0,   0,   0,   0,   0,  64,  24,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,  63, 197, 149, 203,   6,
       172, 204, 139], dtype=uint8)

In [46]: f['E']['Evt']['usr'].array(uproot.asdebug)[0].tostring()                                                                                                                             
Out[46]: b'@\x00\x00\x8e\x00\t\x00\x00\x00\x11@U]i\xa2.R2@B\x80\x00\x00\x00\x00\x00@]\xa8V\x88[\xb2s@\x89\xc8\x00\x00\x00\x00\x00@f\x00\x00\x00\x00\x00\x00@\x84H\x00\x00\x00\x00\x00?\xcbN\x81\xb4\xe8\x1bO@B\xc2\x84\xcc\xdcu*@`\xe9p\x9d\xb2\xf8\x92@Xq\x9e\xd4\xf7\xb6\x8f@I\x80\x00\x00\x00\x00\x00@>\x00\x00\x00\x00\x00\x00@\x1c\x00\x00\x00\x00\x00\x00@\x18\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00?\xc5\x95\xcb\x06\xac\xcc\x8b'

Here it is:

In [52]: f['E']['Evt']['usr_names'].array(uproot.asdebug)[0].tostring()                                                                                                                       
Out[52]: b'@\x00\x00\xc8\x00\t\x00\x00\x00\x11\x0bRecoQuality\x07RecoNDF\x03CoC\x03ToT\x0bChargeAbove\x0bChargeBelow\x0bChargeRatio\tDeltaPosZ\rFirstPartPosZ\x0cLastPartPosZ\tNSnapHits\tNTrigHits\tNTrigDOMs\nNTrigLines\x0eNSpeedVetoHits\x11NGeometryVetoHits\x12ClassficationScore'

So basically the scheme seems to be:

<HEADER><LENGTH1>text1<LENGTH2>text2<LENGTH3>text3...

created merge request !20 (merged) to address this issue

mentioned in merge request !20 (merged)

Could you crosscheck?

In [61]: f['E']['Evt']['usr'].array(uproot.asjagged(uproot.astable(uproot.asdtype([("value", "f8")])), skipbytes=10))[0].value                                                                
Out[61]: 
array([8.54595724e+01, 3.70000000e+01, 1.18630282e+02, 8.25000000e+02,
       1.76000000e+02, 6.49000000e+02, 2.13333333e-01, 3.75196777e+01,
       1.35294997e+02, 9.77753193e+01, 5.10000000e+01, 3.00000000e+01,
       7.00000000e+00, 6.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       1.68633822e-01])

In [62]: f['E']['Evt']['usr_names'].array(uproot.asdebug)[0].tostring()                                                                                                                       
Out[62]: b'@\x00\x00\xc8\x00\t\x00\x00\x00\x11\x0bRecoQuality\x07RecoNDF\x03CoC
\x03ToT\x0bChargeAbove\x0bChargeBelow\x0bChargeRatio\tDeltaPosZ\rFirstPartPosZ
\x0cLastPartPosZ\tNSnapHits\tNTrigHits\tNTrigDOMs\nNTrigLines
\x0eNSpeedVetoHits\x11NGeometryVetoHits\x12ClassficationScore'

Which means, the values correspond to the first event and look like this:

RecoQuality = 8.54595724e+01
RecoNDF = 37
CoC = 1.18630282e+02
ToT = 8.25000000e+02
ChargeAbove = 1.76000000e+02
ChargeBelow = 6.49000000e+02
ChargeRatio = 2.13333333e-01
DeltaPosZ = 3.75196777e+01
FirstPartPosZ = 1.35294997e+02
LastPartPosZ = 9.77753193e+01
NSnapHits = 5.10000000e+01
NTrigHits = 3.00000000e+01
NTrigDOMs = 7.00000000e+00
NTrigLines = 6.00000000e+00
NSpeedVetoHits = 0.00000000e+00
NGeometryVetoHits = 0.00000000e+00
ClassficationScore = 1.68633822e-01

Looks pretty scary but seems to be what I need! Thanks very much. I will get back to this later this afternoon (hopefully), but I guess the solution is provided and the issue can be closed :)

I'll make it pretty and fast

You can try the raw version with:

pip install git+https://git.km3net.de/km3py/km3io.git@35-support-for-aanet-usr-fields-in-offlinereader

In [5]: import km3io                                                                                                                                                                          

In [6]: f = km3io.OfflineReader("/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root")                                                                                  

In [7]: f.usr['NTrigLines']                                                                                                                                                                   
Out[7]: <ChunkedArray [6.0 5.0 4.0 4.0 6.0 3.0 5.0 ...] at 0x7f9e3c63d1d0>

In [8]: f.usr                                                                                                                                                                                 
Out[8]: <km3io.offline.Usr at 0x7f9e2bf7fdd0>

In [9]: f.usr._usr_idx_lookup                                                                                                                                                                 
Out[9]: 
{'RecoQuality': 0,
 'RecoNDF': 1,
 'CoC': 2,
 'ToT': 3,
 'ChargeAbove': 4,
 'ChargeBelow': 5,
 'ChargeRatio': 6,
 'DeltaPosZ': 7,
 'FirstPartPosZ': 8,
 'LastPartPosZ': 9,
 'NSnapHits': 10,
 'NTrigHits': 11,
 'NTrigDOMs': 12,
 'NTrigLines': 13,
 'NSpeedVetoHits': 14,
 'NGeometryVetoHits': 15,
 'ClassficationScore': 16}

What do you think?

In [1]: import km3io                                                                                                                                                                          

In [2]: f = km3io.OfflineReader("/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root")                                                                                  

In [3]: f.usr                                                                                                                                                                                 
Out[3]: <km3io.offline.Usr at 0x7fd62ea519d0>

In [4]: print(f.usr)                                                                                                                                                                          
RecoQuality: [85.45957235835593 68.74744265572737 50.18704013646688 171.1478503932958 200.78442974966975 53.32798196456895 61.732716325589635 ...]
RecoNDF: [37.0 37.0 29.0 88.0 142.0 35.0 23.0 ...]
CoC: [118.6302815337638 44.33580521344907 99.93916717621543 121.24120434153483 170.12331367339033 179.3311394584781 185.94014516280558 ...]
ToT: [825.0 781.0 318.0 1538.0 3264.0 263.0 480.0 ...]
ChargeAbove: [176.0 278.0 53.0 53.0 162.0 143.0 480.0 ...]
ChargeBelow: [649.0 503.0 265.0 1485.0 3102.0 120.0 0.0 ...]
ChargeRatio: [0.21333333333333335 0.3559539052496799 0.16666666666666666 0.03446033810143043 0.04963235294117647 0.5437262357414449 1.0 ...]
DeltaPosZ: [37.51967774166617 -10.280346193553832 13.67595659707355 46.9253540172532 47.63365798691288 9.285015126722357 -15.017636151772649 ...]
FirstPartPosZ: [135.29499707179326 41.46665612378939 107.39596803432326 151.2656996531668 190.10519874250065 187.39500229409555 167.80236158102824 ...]
LastPartPosZ: [97.77531933012709 51.747002317343224 93.72001143724971 104.3403456359136 142.47154075558777 178.1099871673732 182.81999773280089 ...]
NSnapHits: [51.0 107.0 98.0 138.0 235.0 67.0 98.0 ...]
NTrigHits: [30.0 32.0 14.0 56.0 122.0 11.0 19.0 ...]
NTrigDOMs: [7.0 11.0 7.0 21.0 30.0 4.0 10.0 ...]
NTrigLines: [6.0 5.0 4.0 4.0 6.0 3.0 5.0 ...]
NSpeedVetoHits: [0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...]
NGeometryVetoHits: [0.0 0.0 0.0 7.0 30.0 2.0 4.0 ...]
ClassficationScore: [0.16863382173469108 0.17944356593281038 0.08155750660727408 0.14276457019987057 0.2059551962374179 0.0825460962366195 0.11957326403954617 ...]

In [5]: f.usr.DeltaPosZ                                                                                                                                                                       
Out[5]: <ChunkedArray [37.51967774166617 -10.280346193553832 13.67595659707355 46.9253540172532 47.63365798691288 9.285015126722357 -15.017636151772649 ...] at 0x7fd5f2a18f50>

In [6]: f.usr.DeltaPosZ[-10:-5]                                                                                                                                                               
Out[8]: <ChunkedArray [33.78244971851398 34.60089397079446 6.075903244822776 46.88297946868043 9.532229417137984] at 0x7fd62eb8af10>

closed via merge request !20 (merged)

mentioned in commit 82373a23

I released that in v0.9.0, which is currently also deployed to CC Lyon.

Let me know if you need some tweaks. The first call to f.usr takes a few seconds due to the structure of the tree but afterwards it uses an optimised lookup with a very low memory overhead.

The first call to f.usr takes a few seconds due to the structure of the tree but afterwards it uses an optimised lookup with a very low memory overhead.

I am afraid processing of multiple files at once will not be very sustainable. Is there any way this lookup table can be reused when opening a new file? I see this could get hacky and dangerous as one must be sure that all the files have the same structure :/

Yes I can make it reusable, but not sure if we really want it. Is the overhead not negligible compared to the stuff you do with the files?

I measure something like 1 second per usr access... That's ~99% the readout of the whole array and not the map.

See here, this is the "structure" readout: https://git.km3net.de/km3py/km3io/blob/57e2e0f15657487f6979cea6fe1aa9d656bc1b50/km3io/offline.py#L478

Actually the big overhead is the read-in, which is a nested structure.

I'd kindly ask you to make some usability test first, before we optimise the wrong thing The structure of the usr dataformat is not ideal at all, so there is not much one can optimise there. I already did a trick to examine only the first event, but technically this is wrong as it implies that the usr-structure is the same for every event, although this is an unwritten rule, it can be abused in future.

That's another reason why I dislike this design... If I do not assume this rule, the performance is horrible (and the same is true for any other language as well, it is not related to Python).

I am convinced. Closing again :) Thanks very much!

reopened

closed

Alright!

I’ll however still improve the readout to be lazy, so that each usr field will be loaded only when it’s accessed, this will get rid of the whole lag I think.

Here are some numbers (file opening is 260ms, first access to any usr.field is roughly 2.5s and any other access (no matter which field) is just ~40us.

notice that usr is now nested on the corresponding branch, so not f.usr but f.events.usr

Everything is on the latest 37-user-parameters-seem-to-be-transposed branch.

In [1]: %time import km3io
CPU times: user 1.89 s, sys: 6.35 s, total: 8.24 s
Wall time: 3.25 s

In [2]: %time f = km3io.OfflineReader("/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root")
CPU times: user 262 ms, sys: 126 µs, total: 262 ms
Wall time: 261 ms

In [3]: %time f.events.usr.DeltaPosZ
CPU times: user 2.37 s, sys: 80.9 ms, total: 2.46 s
Wall time: 2.45 s
Out[3]: <ChunkedArray [37.51967774166617 -10.280346193553832 13.67595659707355 46.9253540172532 47.63365798691288 9.285015126722357 -15.017636151772649 ...] at 0x7fddf89561d0>

In [4]: %time f.events.usr.DeltaPosZ
CPU times: user 18 µs, sys: 0 ns, total: 18 µs
Wall time: 30 µs
Out[4]: <ChunkedArray [37.51967774166617 -10.280346193553832 13.67595659707355 46.9253540172532 47.63365798691288 9.285015126722357 -15.017636151772649 ...] at 0x7fddf89561d0>

In [5]: %time f.events.usr.RecoQuality
CPU times: user 11 µs, sys: 12 µs, total: 23 µs
Wall time: 40.5 µs
Out[5]: <ChunkedArray [85.45957235835593 68.74744265572737 50.18704013646688 171.1478503932958 200.78442974966975 53.32798196456895 61.732716325589635 ...] at 0x7fddf89ee590>

I think that looks fine.

@mlincetto I got it down to < 400ms for the first call. On my Xeon desktop its 130ms

In [1]: %time import km3io                                                                                                        
CPU times: user 1.73 s, sys: 6.19 s, total: 7.92 s
Wall time: 2.56 s

In [2]: %time f = km3io.OfflineReader("/sps/km3net/users/mlincett/online_ev/ORCAKM3Online.00000049_00007289.root")                
CPU times: user 269 ms, sys: 10.5 ms, total: 279 ms
Wall time: 278 ms

In [3]: %time f.events.usr.DeltaPosZ                                                                                              
CPU times: user 348 ms, sys: 28.4 ms, total: 376 ms
Wall time: 374 ms
Out[3]: <ChunkedArray [37.51967774166617 -10.280346193553832 13.67595659707355 46.9253540172532 47.63365798691288 9.285015126722357 -15.017636151772649 ...] at 0x7f99e6507690>

In [4]: %time f.events.usr.RecoQuality                                                                                            
CPU times: user 8 µs, sys: 14 µs, total: 22 µs
Wall time: 40.5 µs
Out[4]: <ChunkedArray [85.45957235835593 68.74744265572737 50.18704013646688 171.1478503932958 200.78442974966975 53.32798196456895 61.732716325589635 ...] at 0x7f99e6517b50>

In [5]: %time f.events.usr.ChargeAbove                                                                                            
CPU times: user 11 µs, sys: 14 µs, total: 25 µs
Wall time: 42.2 µs
Out[5]: <ChunkedArray [176.0 278.0 53.0 53.0 162.0 143.0 480.0 ...] at 0x7f99e6507790>

Support for aanet 'usr' fields in OfflineReader

Designs

Child items 0

Activity