premiere.org 13.52 KiB
km3io
Export Options
Default
(setq org-export-exclude-tags '("noexport"))
Presentation
(setq org-export-exclude-tags '("noexport" "noexportpresentation"))
Prerequisites
python3 -m venv ~/tmp/km3io-premiere-venv
. ~/tmp/km3io-premiere-venv/bin/activate
# pip install km3io==0.8.1
pip install -e ~/Dev/km3io
(setq org-babel-python-command "~/tmp/km3io-premiere-venv/bin/python")
km3io
- km3io: a tiny Python package with minimal dependencies to read KM3NeT ROOT files
- Goal: provide a **standalone**, **independent** access to KM3NeT data
- Uses the uproot library to access ROOT data
- Provides convenient wrapper classes
- Maximum performance due to numpy and numba
- Data are read lazily:
- only loaded into memory when directly accessed
- apply several cut masks on huge datasets without reading them into the memory
uproot
- Describe the projec
- describe Scikit-HEP
- thanks to Jim
- etc.
Installation
- Dependencies:
- Python 3.5+
- uproot (a small Python package, installed automatically via
pip
) - no binaries!
- No ROOT, Jpp or aanet required to read ROOT files
- Releases are published on the official Python package repository:
pip install km3io
Why is it so cool?
- Runs on Linux, macOS, Windows, as long as Python 3.5+ is installed
- Every data is a
numpy
array orawkward
array (numpy
compatible array of complex data structures)
awkward arrays?
- some details on it
- maybe the link to the talk which Jim gave on a HEP conference about awkward arrays
Accessing Online (DAQ) Data
km3io supports the following DAQ datatypes
-
JDAQEvent
(the event dataformat)- header information
- snapshot hits
- triggered hits
-
JDAQSummaryslices
- header information
- accessor methods for bit flags (HRV, FIFO, UDP status)
-
JDAQTimeslices
- header information
- frame hits
- currently only L1, L2 and SN streams, the L0 stream is work in progress
Examples
Opening a run file
import km3io
# A run from iRODS
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
print(f.events)
print(f.summaryslices)
print(f.timeslices)
Number of events: 115038 Number of summaryslices: 182668 Available timeslice streams: L1, SN
Investigating timeslice frames
a_timeslice = f.timeslices.stream("L1", 23)
print(a_timeslice.frames.keys())
dict_keys([806451572, 806455814, 806465101, 806483369, 806487219, 806487226, 806487231, 808432835, 808435278, 808447180, 808447186, 808451904, 808451907, 808469129, 808472260, 808472265, 808488895, 808488990, 808489014, 808489117, 808493910, 808946818, 808949744, 808951460, 808956908, 808959411, 808961448, 808961480, 808961504, 808961655, 808964815, 808964852, 808964883, 808964908, 808969848, 808969857, 808972593, 808972598, 808972698, 808974758, 808974773, 808974811, 808974972, 808976377, 808979567, 808979721, 808979729, 808981510, 808981523, 808981672, 808981812, 808981864, 808982005, 808982018, 808982041, 808982066, 808982077, 808982547, 808984711, 808996773, 808997793, 809006037, 809007627, 809503416, 809521500, 809524432, 809526097, 809544058, 809544061])
Reading the first 100 ToTs and channel IDs of a frame sent by the DOM 806451572
print(a_timeslice.frames[806451572].tot[:42])
print(a_timeslice.frames[806451572].pmt[:42])
[ 4 19 3 35 29 21 1 22 6 6 29 21 29 26 3 27 11 4 27 29 13 23 4 28 21 24 3 10 25 23 28 25 9 6 14 3 10 25 11 31 10 2] [27 27 14 14 18 22 13 13 30 30 12 10 27 13 7 7 15 15 27 11 23 23 12 12 18 22 29 29 21 8 1 7 9 9 6 6 23 23 25 26 10 10]
Checking the number of UDP packets in summary slices
- functions to parse binary masks and bit positions from the KM3NeT format definitions
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
sumslice = f.summaryslices.slices[23]
print(sumslice.dom_id)
print(km3io.daq.get_number_udp_packets(sumslice.dq_status))
[806451572 806483369 806487231 808435278 808447180 808451907 808472265 808488895 808489014 808489117 808493910 808946818 808949744 808951460 808956908 808959411 808961448 808961504 808961655 808964815 808964883 808964908 808969848 808969857 808972593 808972598 808972698 808974972 808976377 808979721 808979729 808981510 808981523 808981672 808981812 808981864 808982005 808982018 808982041 808982066 808982547 808984711 808996773 808997793 809006037 809007627 809521500 809524432 809544058] [17 17 16 16 25 16 27 17 18 17 21 16 16 16 34 18 18 18 17 18 16 18 15 17 20 18 15 17 17 19 16 18 16 17 17 16 18 18 17 27 18 20 16 17 15 18 17 17 17]
Offline (MC/reco) Data
Reading offline files (aka aanet-ROOT files)
- Events
- header information
- hits
- MC information
- MC tracks
- MC hits
- Reco information
- tracks
- reconstruction info and parameters
Opening a reconstructed MUPAGE file
f = km3io.OfflineReader("mc.root")
print(f)
<km3io.offline.OfflineReader object at 0x1155bde50>
Investigating events and tracks
print(f.events)
Number of events: 10
print(f.tracks.lik)
print(f.tracks.dir_z)
[[294.6407542676734 294.6407542676734 294.6407542676734 ... 67.81221253265059 67.7756405143316 67.77250505700384] [96.75133289411137 96.75133289411137 96.75133289411137 ... 39.21916536442286 39.184645826013806 38.870325146341884] [560.2775306614813 560.2775306614813 560.2775306614813 ... 118.88577278801066 118.72271313687405 117.80785995187605] ... [71.03251451148226 71.03251451148226 71.03251451148226 ... 16.714140573909347 16.444395245214945 16.34639241716669] [326.440133294878 326.440133294878 326.440133294878 ... 87.79818671079849 87.75488082571873 87.74839444768625] [159.77779654216795 159.77779654216795 159.77779654216795 ... 33.8669134999348 33.821631538334984 33.77240735670646]] [[-0.872885221293917 -0.872885221293917 -0.872885221293917 ... -0.6631226836266504 -0.5680647731737454 -0.5680647731737454] [-0.8351996698137462 -0.8351996698137462 -0.8351996698137462 ... -0.7485107718446855 -0.8229838871876581 -0.239315690284641] [-0.989148723802379 -0.989148723802379 -0.989148723802379 ... -0.9350162572437829 -0.88545604390297 -0.88545604390297] ... [-0.5704611045902105 -0.5704611045902105 -0.5704611045902105 ... -0.9350162572437829 -0.4647231989130516 -0.4647231989130516] [-0.9779941383490359 -0.9779941383490359 -0.9779941383490359 ... -0.88545604390297 -0.88545604390297 -0.8229838871876581] [-0.7396916780974963 -0.7396916780974963 -0.7396916780974963 ... -0.6631226836266504 -0.7485107718446855 -0.7485107718446855]]
Some pretty print features for single objects
Hits
print(f[0].hits[1])
offline hit: id : 0 dom_id : 806451572 channel_id : 9 tdc : 0 tot : 30 trig : 1 pmt_id : 0 t : 70104016.0 a : 0.0 pos_x : 0.0 pos_y : 0.0 pos_z : 0.0 dir_x : 0.0 dir_y : 0.0 dir_z : 0.0 pure_t : 0.0 pure_a : 0.0 type : 0 origin : 0 pattern_flags : 0
Tracks
print(f[0].tracks[0])
offline track: fUniqueID : 0 fBits : 33554432 id : 1 pos_x : 445.835395997812 pos_y : 615.1089636184813 pos_z : 125.1448339836911 dir_x : 0.0368711082700674 dir_y : -0.48653048395923415 dir_z : -0.872885221293917 t : 70311446.46401498 E : 99.10458562488608 len : 0.0 lik : 294.6407542676734 type : 0 rec_type : 4000 rec_stages : [1, 3, 5, 4] status : 0 mother_id : -1 hit_ids : [] error_matrix : [] comment : 0 JGANDALF_BETA0_RAD : 0.004957442219414389 JGANDALF_BETA1_RAD : 0.003417848024252858 JGANDALF_CHI2 : -294.6407542676734 JGANDALF_NUMBER_OF_HITS : 142.0 JENERGY_ENERGY : 99.10458562488608 JENERGY_CHI2 : 1.7976931348623157e+308 JGANDALF_LAMBDA : 4.2409761837248484e-12 JGANDALF_NUMBER_OF_ITERATIONS : 10.0 JSTART_NPE_MIP : 24.88469697331908 JSTART_NPE_MIP_TOTAL : 55.88169412579765 JSTART_LENGTH_METRES : 98.89582506402911 JVETO_NPE : 0.0 JVETO_NUMBER_OF_HITS : 0.0 JENERGY_MUON_RANGE_METRES : 344.9767431592819 JENERGY_NOISE_LIKELIHOOD : -333.87773581129136 JENERGY_NDF : 1471.0 JENERGY_NUMBER_OF_HITS : 101.0
Extracting the energy of every first reco track in each event
# from irods:mc/v5.2/mcv5.2.mupage_10T.sirene.jte.1186.root
f = km3io.OfflineReader("mupage.root")
print(f.events)
# number of tracks per event
print(f.mc_tracks.E.counts)
mask = f.mc_tracks.E.counts > 0
print(f.mc_tracks.E[mask, 0])
ORCA DU4 RBR Analysis Example
A tiny function to extract track attributes from a list of files
def extract_features(files, features):
"""Return a dict with the features from every best reco track"""
data = defaultdict(list)
for f in tqdm(files):
tracks = km3io.OfflineReader(f).tracks
mask = tracks.len.counts > 0
for feature in features:
data[feature].append(getattr(tracks, feature)[mask, 0])
return {k: np.hstack(v) for k, v in data.items()}
Extracting E
, lik
, pos[xyz]
and dir[xyz]
- Only takes a few seconds per file
- Results are numpy arrays
sea_files = glob("data/reco-sea/*aanet*.root")
features = ['E', 'lik', *[e + '_' + q for q in 'xyz' for e in ['pos', 'dir']]]
sea_data = extract_features(sea_files, features)
Plotting some data with matplotlib
fig, ax = plt.subplots()
plot_options = {
'histtype': 'step',
'bins': 100,
'log': True,
'linewidth': 2
}
ax.hist(sea_data['E'], label="sea data", **plot_options)
ax.hist(mc_data['E'], label="atm. muons MC (JSirene)", **plot_options)
ax.set_xlabel("energy / GeV")
ax.legend(); ax.grid();
Command line tool(s)
- We are working on some counter parts to the Jpp tools
-
KPrintTree -f FILENAME
: similar toJPrintTree
- more to come (feel free to request or contribute)
-
Thanks
- Zineb Aly (CPPM)
- Tamas Gal (ECAP)
- Johannes Schumann (ECAP)