Skip to content
Snippets Groups Projects
premiere.org 15.81 KiB

km3io

Export Options

Default

(setq org-export-exclude-tags '("noexport"))

Presentation

(setq org-export-exclude-tags '("noexport" "noexportpresentation"))

Prerequisites

python3 -m venv ~/tmp/km3io-premiere-venv
. ~/tmp/km3io-premiere-venv/bin/activate
pip install km3io==0.8.2
(setq org-babel-python-command "~/tmp/km3io-premiere-venv/bin/python")

km3io

  • km3io: a tiny Python package with minimal dependencies to read KM3NeT ROOT files
  • Goal: provide a **standalone**, **independent** access to KM3NeT data
  • Uses the uproot Python library to access ROOT data
  • Maximum performance due to numpy and numba
  • 100% test coverage
  • Automated testing for Python 3.5, 3.6, 3.7 and 3.8

Data in km3io

  • data are read lazily using the awkwardarray Python library
  • only loaded when directly accessed
  • apply cut masks on huge datasets without loading them (wholemeal, database-like workflow)
  • compatible with pandas

uproot

  • ROOT I/O (read/write) in pure Python and Numpy
  • Created by SciKit-HEP (https://scikit-hep.org)

    “The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python. The project started in Autumn 2016 and is in full swing.”

  • Highly recommended if you live in the Python world
  • Unlike PyROOT and root_numpy, uproot does not depend on C++ ROOT
  • Very helpful developers (Jim Pivarski, one of the main authors helped a lot to parse KM3NeT ROOT files and we also contributed to uproot)
  • The rate of reading data into arrays with uproot is shown to be faster than C++ ROOT, PyROOT or root_numpy
  • It’s fast!!!

uproot rate / ROOT rate

images/uproot_vs_root.png

Source: https://github.com/scikit-hep/uproot/blob/master/README.rst

uproot rate / root_numpy rate

images/uproot_vs_root_numpy.png

Source: https://github.com/scikit-hep/uproot/blob/master/README.rst

awkward arrays?

  • “Manipulate arrays of complex data structures as easily as Numpy.”
  • Variable-length lists (jagged/ragged), deeply nested (record structure), different data types in the same list, etc.
  • https://github.com/scikit-hep/awkward-array
  • A recommended talk (by Jim himself) on this topic in the HEP context: https://www.youtube.com/watch?v=2NxWpU7NArk
  • awkward v1.0 being rewritten in C++

Installation of km3io

  • Dependencies:
    • Python 3.5+
    • uproot (a small Python package, installed automatically via pip)
    • no binaries!
  • No ROOT, Jpp or aanet required to read ROOT files
  • Releases are published on the official Python package repository

pip install km3io

Why is it so cool?

  • Runs on Linux, macOS, Windows, as long as Python 3.5+ is installed
  • Every data is a numpy array or awkward array (numpy compatible array of complex data structures)

Online (DAQ) Data

km3io supports the following DAQ datatypes

  • JDAQEvent (the event dataformat)
    • header information
    • snapshot hits
    • triggered hits
  • JDAQSummaryslices
    • header information
    • rates
    • accessor methods for bit flags (HRV, FIFO, UDP status)
  • JDAQTimeslices
    • header information
    • frame hits
    • currently only L1, L2 and SN streams
    • L0 stream is work in progress

Examples

Opening a run file

import km3io
# A run from iRODS
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
print(f.events)
print(f.summaryslices)
print(f.timeslices)
Number of events: 115038
Number of summaryslices: 182668
Available timeslice streams: SN, L1

Investigating timeslice frames

a_timeslice = f.timeslices.stream("L1", 23)
print(a_timeslice.frames.keys())
dict_keys([806451572, 806455814, 806465101, 806483369, 806487219, 806487226, 806487231, 808432835, 808435278, 808447180, 808447186, 808451904, 808451907, 808469129, 808472260, 808472265, 808488895, 808488990, 808489014, 808489117, 808493910, 808946818, 808949744, 808951460, 808956908, 808959411, 808961448, 808961480, 808961504, 808961655, 808964815, 808964852, 808964883, 808964908, 808969848, 808969857, 808972593, 808972598, 808972698, 808974758, 808974773, 808974811, 808974972, 808976377, 808979567, 808979721, 808979729, 808981510, 808981523, 808981672, 808981812, 808981864, 808982005, 808982018, 808982041, 808982066, 808982077, 808982547, 808984711, 808996773, 808997793, 809006037, 809007627, 809503416, 809521500, 809524432, 809526097, 809544058, 809544061])

Reading the first 42 ToTs and channel IDs of a frame sent by the DOM 806451572

print(a_timeslice.frames[806451572].tot[:42])
print(a_timeslice.frames[806451572].pmt[:42])
[33 16 22 24  7 27  4 31 31 15  5 26 30 24  7 26 26 26 27 15  7  3 63 28
 26 30 25 24 20  7 23  6 22 22 26 15 29 25 24 22 23 21]
[ 0  9  4 12 13  2 10 10  8  9 27 27 28 18 27  2  6 15  4  2  2 12 27  3
 15 10 23 14 19  9  9 24 24  6  7 20  7 20 27 22 24 25]

Checking the number of UDP packets in summary slices

  • functions to parse binary masks and bit positions from the KM3NeT format definitions
f = km3io.DAQReader("KM3NeT_00000044_00006880.root")
sumslice = f.summaryslices.slices[23]
print(sumslice.dom_id)
print(km3io.daq.get_number_udp_packets(sumslice.dq_status))
[806451572 806483369 806487231 808435278 808447180 808451907 808472265
 808488895 808489014 808489117 808493910 808946818 808949744 808951460
 808956908 808959411 808961448 808961504 808961655 808964815 808964883
 808964908 808969848 808969857 808972593 808972598 808972698 808974972
 808976377 808979721 808979729 808981510 808981523 808981672 808981812
 808981864 808982005 808982018 808982041 808982066 808982547 808984711
 808996773 808997793 809006037 809007627 809521500 809524432 809544058]
[17 17 16 16 25 16 27 17 18 17 21 16 16 16 34 18 18 18 17 18 16 18 15 17
 20 18 15 17 17 19 16 18 16 17 17 16 18 18 17 27 18 20 16 17 15 18 17 17
 17]

Offline (MC/reco) Data

Reading offline files (aka aanet-ROOT files)

  • Events
    • header information
    • hits
  • MC information
    • MC tracks
    • MC hits
  • Reco information
    • tracks
    • reconstruction info and parameters

Opening a reconstructed MUPAGE file

f = km3io.OfflineReader("mc.root")
print(f)
<km3io.offline.OfflineReader object at 0x10b267f50>

Investigating events and tracks

print(f.events)
Number of events: 10
print(f.tracks.lik)
print(f.tracks.dir_z)
[[294.6407542676734 294.6407542676734 294.6407542676734 ... 67.81221253265059 67.7756405143316 67.77250505700384] [96.75133289411137 96.75133289411137 96.75133289411137 ... 39.21916536442286 39.184645826013806 38.870325146341884] [560.2775306614813 560.2775306614813 560.2775306614813 ... 118.88577278801066 118.72271313687405 117.80785995187605] ... [71.03251451148226 71.03251451148226 71.03251451148226 ... 16.714140573909347 16.444395245214945 16.34639241716669] [326.440133294878 326.440133294878 326.440133294878 ... 87.79818671079849 87.75488082571873 87.74839444768625] [159.77779654216795 159.77779654216795 159.77779654216795 ... 33.8669134999348 33.821631538334984 33.77240735670646]]
[[-0.872885221293917 -0.872885221293917 -0.872885221293917 ... -0.6631226836266504 -0.5680647731737454 -0.5680647731737454] [-0.8351996698137462 -0.8351996698137462 -0.8351996698137462 ... -0.7485107718446855 -0.8229838871876581 -0.239315690284641] [-0.989148723802379 -0.989148723802379 -0.989148723802379 ... -0.9350162572437829 -0.88545604390297 -0.88545604390297] ... [-0.5704611045902105 -0.5704611045902105 -0.5704611045902105 ... -0.9350162572437829 -0.4647231989130516 -0.4647231989130516] [-0.9779941383490359 -0.9779941383490359 -0.9779941383490359 ... -0.88545604390297 -0.88545604390297 -0.8229838871876581] [-0.7396916780974963 -0.7396916780974963 -0.7396916780974963 ... -0.6631226836266504 -0.7485107718446855 -0.7485107718446855]]

Some pretty print features for single objects

Hits

print(f[0].hits[1])
offline hit:
	id                  :               0
	dom_id              :       806451572
	channel_id          :               9
	tdc                 :               0
	tot                 :              30
	trig                :               1
	pmt_id              :               0
	t                   :      70104016.0
	a                   :             0.0
	pos_x               :             0.0
	pos_y               :             0.0
	pos_z               :             0.0
	dir_x               :             0.0
	dir_y               :             0.0
	dir_z               :             0.0
	pure_t              :             0.0
	pure_a              :             0.0
	type                :               0
	origin              :               0
	pattern_flags       :               0

Tracks

print(f[0].tracks[0])
offline track:
	fUniqueID                      :                           0
	fBits                          :                    33554432
	id                             :                           1
	pos_x                          :            445.835395997812
	pos_y                          :           615.1089636184813
	pos_z                          :           125.1448339836911
	dir_x                          :          0.0368711082700674
	dir_y                          :        -0.48653048395923415
	dir_z                          :          -0.872885221293917
	t                              :           70311446.46401498
	E                              :           99.10458562488608
	len                            :                         0.0
	lik                            :           294.6407542676734
	type                           :                           0
	rec_type                       :                        4000
	rec_stages                     :                [1, 3, 5, 4]
	status                         :                           0
	mother_id                      :                          -1
	hit_ids                        :                          []
	error_matrix                   :                          []
	comment                        :                           0
	JGANDALF_BETA0_RAD             :        0.004957442219414389
	JGANDALF_BETA1_RAD             :        0.003417848024252858
	JGANDALF_CHI2                  :          -294.6407542676734
	JGANDALF_NUMBER_OF_HITS        :                       142.0
	JENERGY_ENERGY                 :           99.10458562488608
	JENERGY_CHI2                   :     1.7976931348623157e+308
	JGANDALF_LAMBDA                :      4.2409761837248484e-12
	JGANDALF_NUMBER_OF_ITERATIONS  :                        10.0
	JSTART_NPE_MIP                 :           24.88469697331908
	JSTART_NPE_MIP_TOTAL           :           55.88169412579765
	JSTART_LENGTH_METRES           :           98.89582506402911
	JVETO_NPE                      :                         0.0
	JVETO_NUMBER_OF_HITS           :                         0.0
	JENERGY_MUON_RANGE_METRES      :           344.9767431592819
	JENERGY_NOISE_LIKELIHOOD       :         -333.87773581129136
	JENERGY_NDF                    :                      1471.0
	JENERGY_NUMBER_OF_HITS         :                       101.0

Extracting the energy of every first reco track in each event

# from irods:mc/v5.2/mcv5.2.mupage_10T.sirene.jte.1186.root
f = km3io.OfflineReader("mupage.root")
print(f.events)
# number of tracks per event
print(f.mc_tracks.E.counts)
mask = f.mc_tracks.E.counts > 0
print(f.mc_tracks.E[mask, 0])
Number of events: 12236
[11  2  3 ... 10  1  4]
[17.72 73.213 10884.78 1694.332 1221.061 22945.123 11019.418 ...]

ORCA DU4 RBR Analysis Example

A tiny function to extract track attributes from a list of files

def extract_features(files, features):
    """Gather features from the best reco tracks"""
    data = defaultdict(list)
    for f in tqdm(files):
        tracks = km3io.OfflineReader(f).tracks
        mask = tracks.len.counts > 0
        for feature in features:
            data[feature].append(getattr(tracks, feature)[mask, 0])
    return {k: np.hstack(v) for k, v in data.items()}

Extracting E, lik, pos[xyz] and dir[xyz]

  • Only takes a few seconds per file
  • Results are numpy arrays
sea_files = glob("data/reco-sea/*aanet*.root")
mc_files = glob("data/reco-rbr-muatm/*sirene*aanet*.root")
features = ['E', 'lik', *[e + '_' + q for q in 'xyz' for e in ['pos', 'dir']]]
sea_data = extract_features(sea_files, features)
mc_data = extract_features(mc_files, features)

Plotting some data with matplotlib

fig, ax = plt.subplots()
plot_options = {
    'histtype': 'step',
    'bins': 100,
    'log': True,
    'linewidth': 2
}
ax.hist(sea_data['E'], label="sea data", **plot_options)
ax.hist(mc_data['E'], label="atm. muons MC (JSirene)", **plot_options)
ax.set_xlabel("energy / GeV")
ax.legend(); ax.grid();

Energy Distribution Comparison (example)

./images/orca-du4.png

Command line tool(s)

  • We are working on some counter parts to the Jpp tools
    • KPrintTree -f FILENAME (similar to JPrintTree)
    • more to come (feel free to request or contribute)

Thanks

  • Zineb Aly (CPPM)
  • Tamas Gal (ECAP)
  • Johannes Schumann (ECAP)

Important links