From a819f0e39da4724b9e87fdb95e2cec6709994ad6 Mon Sep 17 00:00:00 2001 From: Tamas Gal <tgal@km3net.de> Date: Fri, 14 Feb 2020 09:56:12 +0100 Subject: [PATCH] First final version of the talk --- talks/premiere.org | 76 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 54 insertions(+), 22 deletions(-) diff --git a/talks/premiere.org b/talks/premiere.org index be87d18..b03dec0 100644 --- a/talks/premiere.org +++ b/talks/premiere.org @@ -10,7 +10,7 @@ #+Subtitle: Reading [KM3NeT] ROOT files without ROOT #+Author: Zineb Aly (CPPM), Tamas Gal (ECAP) and Johannes Schumann (ECAP) #+Email: zaly@km3et.de, tgal@km3net.de, jschumann@km3net.de -#+REVEAL_TALK_URL: https://indico.cern.ch/event/878692/ +#+REVEAL_TALK_URL: https://indico.cern.ch/event/871318/contributions/3740124 * Export Options :noexport: ** Default @@ -35,8 +35,7 @@ #+BEGIN_SRC bash :results silent :async t python3 -m venv ~/tmp/km3io-premiere-venv . ~/tmp/km3io-premiere-venv/bin/activate -# pip install km3io==0.8.1 -pip install -e ~/Dev/km3io +pip install km3io==0.8.2 #+END_SRC #+BEGIN_SRC elisp @@ -49,19 +48,29 @@ pip install -e ~/Dev/km3io * km3io - [[https://git.km3net.de/km3py/km3io][km3io]]: a tiny Python package with minimal dependencies to read KM3NeT ROOT files - *Goal*: provide a **standalone**, **independent** access to KM3NeT data -- Uses the [[https://github.com/scikit-hep/uproot][uproot]] library to access ROOT data +- Uses the [[https://github.com/scikit-hep/uproot][uproot]] Python library to access ROOT data - Maximum performance due to [[https://www.numpy.org][numpy]] and [[http://numba.pydata.org][numba]] -- Data are read lazily: - - only loaded when directly accessed - - cut masks on huge datasets without loading them +- 100% test coverage +- Automated testing for Python 3.5, 3.6, 3.7 and 3.8 -** uproot + +** Data in ~km3io~ +- data are read lazily using the [[https://github.com/scikit-hep/awkward-array][awkwardarray]] Python library +- only loaded when directly accessed +- apply cut masks on huge datasets without loading them (wholemeal, database-like workflow) +- compatible with [[https://pandas.pydata.org][pandas]] + +* uproot +:PROPERTIES: +:reveal_background: linear-gradient(to left, #910830, #521623) +:END: - ROOT I/O (read/write) in pure Python and Numpy - Created by SciKit-HEP ([[https://scikit-hep.org][https://scikit-hep.org]]) "The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python. The project started in Autumn 2016 and is in full swing." +- Highly recommended if you live in the Python world #+REVEAL: split @@ -70,42 +79,58 @@ pip install -e ~/Dev/km3io - Very helpful developers (*Jim Pivarski*, one of the main authors helped a lot to parse KM3NeT ROOT files and we also contributed to uproot) - The rate of reading data into arrays with ~uproot~ is shown to be faster than - C++ ROOT or ~root_numpy~ + C++ ROOT, ~PyROOT~ or ~root_numpy~ - *It's fast!!!* *** uproot rate / ROOT rate +:PROPERTIES: +:reveal_background: linear-gradient(to left, #910830, #521623) +:END: [[file:images/uproot_vs_root.png]] Source: https://github.com/scikit-hep/uproot/blob/master/README.rst *** uproot rate / ~root_numpy~ rate +:PROPERTIES: +:reveal_background: linear-gradient(to left, #910830, #521623) +:END: [[file:images/uproot_vs_root_numpy.png]] Source: https://github.com/scikit-hep/uproot/blob/master/README.rst ** awkward arrays? +:PROPERTIES: +:reveal_background: linear-gradient(to left, #910830, #521623) +:END: - "Manipulate arrays of complex data structures as easily as Numpy." - Variable-length lists (jagged/ragged), deeply nested (record structure), different data types in the same list, etc. - https://github.com/scikit-hep/awkward-array - A recommended talk (by Jim himself) on this topic in the HEP context: https://www.youtube.com/watch?v=2NxWpU7NArk -- ~awkward v1.0~ being rewritten in C++ with focus on ~numba~ +- ~awkward v1.0~ being rewritten in C++ -** Installation +* Installation of ~km3io~ - Dependencies: - Python 3.5+ - uproot (a small Python package, installed automatically via ~pip~) - no binaries! - *No ROOT, Jpp or aanet* required to read ROOT files -- Releases are published on the official Python package repository: - - ~pip install km3io~ +- Releases are published on the official Python package repository + +** ~pip install km3io~ +:PROPERTIES: +:reveal_background: linear-gradient(to left, #ff12a8, #27aae3) +:END: ** Why is it so cool? - Runs on Linux, macOS, Windows, as long as Python 3.5+ is installed - Every data is a ~numpy~ array or ~awkward~ array (~numpy~ compatible array of complex data structures) -* Accessing Online (DAQ) Data +* Online (DAQ) Data +:PROPERTIES: +:reveal_background: linear-gradient(to bottom, #27aae3, #000000) +:END: ** km3io supports the following DAQ datatypes - ~JDAQEvent~ (the event dataformat) - header information @@ -138,7 +163,7 @@ print(f.timeslices) #+RESULTS: : Number of events: 115038 : Number of summaryslices: 182668 -: Available timeslice streams: L1, SN +: Available timeslice streams: SN, L1 *** Investigating timeslice frames @@ -158,10 +183,10 @@ print(a_timeslice.frames[806451572].pmt[:42]) #+END_SRC #+RESULTS: -: [ 4 19 3 35 29 21 1 22 6 6 29 21 29 26 3 27 11 4 27 29 13 23 4 28 -: 21 24 3 10 25 23 28 25 9 6 14 3 10 25 11 31 10 2] -: [27 27 14 14 18 22 13 13 30 30 12 10 27 13 7 7 15 15 27 11 23 23 12 12 -: 18 22 29 29 21 8 1 7 9 9 6 6 23 23 25 26 10 10] +: [33 16 22 24 7 27 4 31 31 15 5 26 30 24 7 26 26 26 27 15 7 3 63 28 +: 26 30 25 24 20 7 23 6 22 22 26 15 29 25 24 22 23 21] +: [ 0 9 4 12 13 2 10 10 8 9 27 27 28 18 27 2 6 15 4 2 2 12 27 3 +: 15 10 23 14 19 9 9 24 24 6 7 20 7 20 27 22 24 25] *** Checking the number of UDP packets in summary slices @@ -189,6 +214,9 @@ print(km3io.daq.get_number_udp_packets(sumslice.dq_status)) #+end_example * Offline (MC/reco) Data +:PROPERTIES: +:reveal_background: linear-gradient(to bottom, #e3b1e3, #000000) +:END: ** Reading offline files (aka aanet-ROOT files) - Events - header information @@ -207,7 +235,7 @@ print(f) #+END_SRC #+RESULTS: -: <km3io.offline.OfflineReader object at 0x1155bde50> +: <km3io.offline.OfflineReader object at 0x10b267f50> ** Investigating events and tracks #+BEGIN_SRC python :results output replace :session km3io :exports both @@ -318,6 +346,11 @@ mask = f.mc_tracks.E.counts > 0 print(f.mc_tracks.E[mask, 0]) #+END_SRC +#+RESULTS: +: Number of events: 12236 +: [11 2 3 ... 10 1 4] +: [17.72 73.213 10884.78 1694.332 1221.061 22945.123 11019.418 ...] + * ORCA DU4 RBR Analysis Example ** A tiny function to extract track attributes from a list of files @@ -366,8 +399,7 @@ ax.legend(); ax.grid(); * Command line tool(s) - We are working on some counter parts to the Jpp tools - - ~KPrintTree -f FILENAME~ - - similar to ~JPrintTree~ + - ~KPrintTree -f FILENAME~ (similar to ~JPrintTree~) - more to come (feel free to request or contribute) * Thanks -- GitLab