premiere.org 15.81 KiB
km3io
Export Options
Default
(setq org-export-exclude-tags '("noexport"))
Presentation
(setq org-export-exclude-tags '("noexport" "noexportpresentation"))
Prerequisites
python3 -m venv ~/tmp/km3io-premiere-venv
. ~/tmp/km3io-premiere-venv/bin/activate
pip install km3io==0.8.2
(setq org-babel-python-command "~/tmp/km3io-premiere-venv/bin/python")
km3io
- km3io: a tiny Python package with minimal dependencies to read KM3NeT ROOT files
- Goal: provide a **standalone**, **independent** access to KM3NeT data
- Uses the uproot Python library to access ROOT data
- Maximum performance due to numpy and numba
- 100% test coverage
- Automated testing for Python 3.5, 3.6, 3.7 and 3.8
Data in km3io
- data are read lazily using the awkwardarray Python library
- only loaded when directly accessed
- apply cut masks on huge datasets without loading them (wholemeal, database-like workflow)
- compatible with pandas
uproot
- ROOT I/O (read/write) in pure Python and Numpy
- Created by SciKit-HEP (https://scikit-hep.org)
“The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python. The project started in Autumn 2016 and is in full swing.”
- Highly recommended if you live in the Python world
- Unlike
PyROOT
androot_numpy
,uproot
does not depend on C++ ROOT - Very helpful developers (Jim Pivarski, one of the main authors helped a lot to parse KM3NeT ROOT files and we also contributed to uproot)
- The rate of reading data into arrays with
uproot
is shown to be faster than C++ ROOT,PyROOT
orroot_numpy
- It’s fast!!!
uproot rate / ROOT rate
Source: https://github.com/scikit-hep/uproot/blob/master/README.rst
uproot rate / root_numpy
rate
Source: https://github.com/scikit-hep/uproot/blob/master/README.rst
awkward arrays?
- “Manipulate arrays of complex data structures as easily as Numpy.”
- Variable-length lists (jagged/ragged), deeply nested (record structure), different data types in the same list, etc.
- https://github.com/scikit-hep/awkward-array
- A recommended talk (by Jim himself) on this topic in the HEP context: https://www.youtube.com/watch?v=2NxWpU7NArk
-
awkward v1.0
being rewritten in C++
Installation of km3io
- Dependencies:
- Python 3.5+
- uproot (a small Python package, installed automatically via
pip
) - no binaries!
- No ROOT, Jpp or aanet required to read ROOT files
- Releases are published on the official Python package repository
pip install km3io
Why is it so cool?
- Runs on Linux, macOS, Windows, as long as Python 3.5+ is installed
- Every data is a
numpy
array orawkward
array (numpy
compatible array of complex data structures)
Online (DAQ) Data
km3io supports the following DAQ datatypes
-
JDAQEvent
(the event dataformat)- header information
- snapshot hits
- triggered hits
-
JDAQSummaryslices
- header information
- rates
- accessor methods for bit flags (HRV, FIFO, UDP status)
-
JDAQTimeslices
- header information
- frame hits
- currently only L1, L2 and SN streams
- L0 stream is work in progress
Examples
Opening a run file
import km3io
# A run from iRODS
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
print(f.events)
print(f.summaryslices)
print(f.timeslices)
Number of events: 115038 Number of summaryslices: 182668 Available timeslice streams: SN, L1
Investigating timeslice frames
a_timeslice = f.timeslices.stream("L1", 23)
print(a_timeslice.frames.keys())
dict_keys([806451572, 806455814, 806465101, 806483369, 806487219, 806487226, 806487231, 808432835, 808435278, 808447180, 808447186, 808451904, 808451907, 808469129, 808472260, 808472265, 808488895, 808488990, 808489014, 808489117, 808493910, 808946818, 808949744, 808951460, 808956908, 808959411, 808961448, 808961480, 808961504, 808961655, 808964815, 808964852, 808964883, 808964908, 808969848, 808969857, 808972593, 808972598, 808972698, 808974758, 808974773, 808974811, 808974972, 808976377, 808979567, 808979721, 808979729, 808981510, 808981523, 808981672, 808981812, 808981864, 808982005, 808982018, 808982041, 808982066, 808982077, 808982547, 808984711, 808996773, 808997793, 809006037, 809007627, 809503416, 809521500, 809524432, 809526097, 809544058, 809544061])
Reading the first 42 ToTs and channel IDs of a frame sent by the DOM 806451572
print(a_timeslice.frames[806451572].tot[:42])
print(a_timeslice.frames[806451572].pmt[:42])
[33 16 22 24 7 27 4 31 31 15 5 26 30 24 7 26 26 26 27 15 7 3 63 28 26 30 25 24 20 7 23 6 22 22 26 15 29 25 24 22 23 21] [ 0 9 4 12 13 2 10 10 8 9 27 27 28 18 27 2 6 15 4 2 2 12 27 3 15 10 23 14 19 9 9 24 24 6 7 20 7 20 27 22 24 25]
Checking the number of UDP packets in summary slices
- functions to parse binary masks and bit positions from the KM3NeT format definitions
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
sumslice = f.summaryslices.slices[23]
print(sumslice.dom_id)
print(km3io.daq.get_number_udp_packets(sumslice.dq_status))
[806451572 806483369 806487231 808435278 808447180 808451907 808472265 808488895 808489014 808489117 808493910 808946818 808949744 808951460 808956908 808959411 808961448 808961504 808961655 808964815 808964883 808964908 808969848 808969857 808972593 808972598 808972698 808974972 808976377 808979721 808979729 808981510 808981523 808981672 808981812 808981864 808982005 808982018 808982041 808982066 808982547 808984711 808996773 808997793 809006037 809007627 809521500 809524432 809544058] [17 17 16 16 25 16 27 17 18 17 21 16 16 16 34 18 18 18 17 18 16 18 15 17 20 18 15 17 17 19 16 18 16 17 17 16 18 18 17 27 18 20 16 17 15 18 17 17 17]
Offline (MC/reco) Data
Reading offline files (aka aanet-ROOT files)
- Events
- header information
- hits
- MC information
- MC tracks
- MC hits
- Reco information
- tracks
- reconstruction info and parameters
Opening a reconstructed MUPAGE file
f = km3io.OfflineReader("mc.root")
print(f)
<km3io.offline.OfflineReader object at 0x10b267f50>
Investigating events and tracks
print(f.events)
Number of events: 10
print(f.tracks.lik)
print(f.tracks.dir_z)
[[294.6407542676734 294.6407542676734 294.6407542676734 ... 67.81221253265059 67.7756405143316 67.77250505700384] [96.75133289411137 96.75133289411137 96.75133289411137 ... 39.21916536442286 39.184645826013806 38.870325146341884] [560.2775306614813 560.2775306614813 560.2775306614813 ... 118.88577278801066 118.72271313687405 117.80785995187605] ... [71.03251451148226 71.03251451148226 71.03251451148226 ... 16.714140573909347 16.444395245214945 16.34639241716669] [326.440133294878 326.440133294878 326.440133294878 ... 87.79818671079849 87.75488082571873 87.74839444768625] [159.77779654216795 159.77779654216795 159.77779654216795 ... 33.8669134999348 33.821631538334984 33.77240735670646]] [[-0.872885221293917 -0.872885221293917 -0.872885221293917 ... -0.6631226836266504 -0.5680647731737454 -0.5680647731737454] [-0.8351996698137462 -0.8351996698137462 -0.8351996698137462 ... -0.7485107718446855 -0.8229838871876581 -0.239315690284641] [-0.989148723802379 -0.989148723802379 -0.989148723802379 ... -0.9350162572437829 -0.88545604390297 -0.88545604390297] ... [-0.5704611045902105 -0.5704611045902105 -0.5704611045902105 ... -0.9350162572437829 -0.4647231989130516 -0.4647231989130516] [-0.9779941383490359 -0.9779941383490359 -0.9779941383490359 ... -0.88545604390297 -0.88545604390297 -0.8229838871876581] [-0.7396916780974963 -0.7396916780974963 -0.7396916780974963 ... -0.6631226836266504 -0.7485107718446855 -0.7485107718446855]]
Some pretty print features for single objects
Hits
print(f[0].hits[1])
offline hit: id : 0 dom_id : 806451572 channel_id : 9 tdc : 0 tot : 30 trig : 1 pmt_id : 0 t : 70104016.0 a : 0.0 pos_x : 0.0 pos_y : 0.0 pos_z : 0.0 dir_x : 0.0 dir_y : 0.0 dir_z : 0.0 pure_t : 0.0 pure_a : 0.0 type : 0 origin : 0 pattern_flags : 0
Tracks
print(f[0].tracks[0])
offline track: fUniqueID : 0 fBits : 33554432 id : 1 pos_x : 445.835395997812 pos_y : 615.1089636184813 pos_z : 125.1448339836911 dir_x : 0.0368711082700674 dir_y : -0.48653048395923415 dir_z : -0.872885221293917 t : 70311446.46401498 E : 99.10458562488608 len : 0.0 lik : 294.6407542676734 type : 0 rec_type : 4000 rec_stages : [1, 3, 5, 4] status : 0 mother_id : -1 hit_ids : [] error_matrix : [] comment : 0 JGANDALF_BETA0_RAD : 0.004957442219414389 JGANDALF_BETA1_RAD : 0.003417848024252858 JGANDALF_CHI2 : -294.6407542676734 JGANDALF_NUMBER_OF_HITS : 142.0 JENERGY_ENERGY : 99.10458562488608 JENERGY_CHI2 : 1.7976931348623157e+308 JGANDALF_LAMBDA : 4.2409761837248484e-12 JGANDALF_NUMBER_OF_ITERATIONS : 10.0 JSTART_NPE_MIP : 24.88469697331908 JSTART_NPE_MIP_TOTAL : 55.88169412579765 JSTART_LENGTH_METRES : 98.89582506402911 JVETO_NPE : 0.0 JVETO_NUMBER_OF_HITS : 0.0 JENERGY_MUON_RANGE_METRES : 344.9767431592819 JENERGY_NOISE_LIKELIHOOD : -333.87773581129136 JENERGY_NDF : 1471.0 JENERGY_NUMBER_OF_HITS : 101.0
Extracting the energy of every first reco track in each event
# from irods:mc/v5.2/mcv5.2.mupage_10T.sirene.jte.1186.root
f = km3io.OfflineReader("mupage.root")
print(f.events)
# number of tracks per event
print(f.mc_tracks.E.counts)
mask = f.mc_tracks.E.counts > 0
print(f.mc_tracks.E[mask, 0])
Number of events: 12236 [11 2 3 ... 10 1 4] [17.72 73.213 10884.78 1694.332 1221.061 22945.123 11019.418 ...]
ORCA DU4 RBR Analysis Example
A tiny function to extract track attributes from a list of files
def extract_features(files, features):
"""Gather features from the best reco tracks"""
data = defaultdict(list)
for f in tqdm(files):
tracks = km3io.OfflineReader(f).tracks
mask = tracks.len.counts > 0
for feature in features:
data[feature].append(getattr(tracks, feature)[mask, 0])
return {k: np.hstack(v) for k, v in data.items()}
Extracting E
, lik
, pos[xyz]
and dir[xyz]
- Only takes a few seconds per file
- Results are numpy arrays
sea_files = glob("data/reco-sea/*aanet*.root")
mc_files = glob("data/reco-rbr-muatm/*sirene*aanet*.root")
features = ['E', 'lik', *[e + '_' + q for q in 'xyz' for e in ['pos', 'dir']]]
sea_data = extract_features(sea_files, features)
mc_data = extract_features(mc_files, features)
Plotting some data with matplotlib
fig, ax = plt.subplots()
plot_options = {
'histtype': 'step',
'bins': 100,
'log': True,
'linewidth': 2
}
ax.hist(sea_data['E'], label="sea data", **plot_options)
ax.hist(mc_data['E'], label="atm. muons MC (JSirene)", **plot_options)
ax.set_xlabel("energy / GeV")
ax.legend(); ax.grid();
Energy Distribution Comparison (example)
Command line tool(s)
- We are working on some counter parts to the Jpp tools
-
KPrintTree -f FILENAME
(similar toJPrintTree
) - more to come (feel free to request or contribute)
-
Thanks
- Zineb Aly (CPPM)
- Tamas Gal (ECAP)
- Johannes Schumann (ECAP)