Skip to content
Snippets Groups Projects

Add premiere talk

Merged Tamas Gal requested to merge update-talk into master
1 file
+ 54
22
Compare changes
  • Side-by-side
  • Inline
+ 54
22
@@ -10,7 +10,7 @@
#+Subtitle: Reading [KM3NeT] ROOT files without ROOT
#+Author: Zineb Aly (CPPM), Tamas Gal (ECAP) and Johannes Schumann (ECAP)
#+Email: zaly@km3et.de, tgal@km3net.de, jschumann@km3net.de
#+REVEAL_TALK_URL: https://indico.cern.ch/event/878692/
#+REVEAL_TALK_URL: https://indico.cern.ch/event/871318/contributions/3740124
* Export Options :noexport:
** Default
@@ -35,8 +35,7 @@
#+BEGIN_SRC bash :results silent :async t
python3 -m venv ~/tmp/km3io-premiere-venv
. ~/tmp/km3io-premiere-venv/bin/activate
# pip install km3io==0.8.1
pip install -e ~/Dev/km3io
pip install km3io==0.8.2
#+END_SRC
#+BEGIN_SRC elisp
@@ -49,19 +48,29 @@ pip install -e ~/Dev/km3io
* km3io
- [[https://git.km3net.de/km3py/km3io][km3io]]: a tiny Python package with minimal dependencies to read KM3NeT ROOT files
- *Goal*: provide a **standalone**, **independent** access to KM3NeT data
- Uses the [[https://github.com/scikit-hep/uproot][uproot]] library to access ROOT data
- Uses the [[https://github.com/scikit-hep/uproot][uproot]] Python library to access ROOT data
- Maximum performance due to [[https://www.numpy.org][numpy]] and [[http://numba.pydata.org][numba]]
- Data are read lazily:
- only loaded when directly accessed
- cut masks on huge datasets without loading them
- 100% test coverage
- Automated testing for Python 3.5, 3.6, 3.7 and 3.8
** uproot
** Data in ~km3io~
- data are read lazily using the [[https://github.com/scikit-hep/awkward-array][awkwardarray]] Python library
- only loaded when directly accessed
- apply cut masks on huge datasets without loading them (wholemeal, database-like workflow)
- compatible with [[https://pandas.pydata.org][pandas]]
* uproot
:PROPERTIES:
:reveal_background: linear-gradient(to left, #910830, #521623)
:END:
- ROOT I/O (read/write) in pure Python and Numpy
- Created by SciKit-HEP ([[https://scikit-hep.org][https://scikit-hep.org]])
"The Scikit-HEP project is a community-driven and community-oriented project
with the aim of providing Particle Physics at large with an ecosystem for
data analysis in Python. The project started in Autumn 2016 and is in full swing."
- Highly recommended if you live in the Python world
#+REVEAL: split
@@ -70,42 +79,58 @@ pip install -e ~/Dev/km3io
- Very helpful developers (*Jim Pivarski*, one of the main authors helped a lot to
parse KM3NeT ROOT files and we also contributed to uproot)
- The rate of reading data into arrays with ~uproot~ is shown to be faster than
C++ ROOT or ~root_numpy~
C++ ROOT, ~PyROOT~ or ~root_numpy~
- *It's fast!!!*
*** uproot rate / ROOT rate
:PROPERTIES:
:reveal_background: linear-gradient(to left, #910830, #521623)
:END:
[[file:images/uproot_vs_root.png]]
Source: https://github.com/scikit-hep/uproot/blob/master/README.rst
*** uproot rate / ~root_numpy~ rate
:PROPERTIES:
:reveal_background: linear-gradient(to left, #910830, #521623)
:END:
[[file:images/uproot_vs_root_numpy.png]]
Source: https://github.com/scikit-hep/uproot/blob/master/README.rst
** awkward arrays?
:PROPERTIES:
:reveal_background: linear-gradient(to left, #910830, #521623)
:END:
- "Manipulate arrays of complex data structures as easily as Numpy."
- Variable-length lists (jagged/ragged), deeply nested (record structure),
different data types in the same list, etc.
- https://github.com/scikit-hep/awkward-array
- A recommended talk (by Jim himself) on this topic in the HEP context:
https://www.youtube.com/watch?v=2NxWpU7NArk
- ~awkward v1.0~ being rewritten in C++ with focus on ~numba~
- ~awkward v1.0~ being rewritten in C++
** Installation
* Installation of ~km3io~
- Dependencies:
- Python 3.5+
- uproot (a small Python package, installed automatically via ~pip~)
- no binaries!
- *No ROOT, Jpp or aanet* required to read ROOT files
- Releases are published on the official Python package repository:
- ~pip install km3io~
- Releases are published on the official Python package repository
** ~pip install km3io~
:PROPERTIES:
:reveal_background: linear-gradient(to left, #ff12a8, #27aae3)
:END:
** Why is it so cool?
- Runs on Linux, macOS, Windows, as long as Python 3.5+ is installed
- Every data is a ~numpy~ array or ~awkward~ array (~numpy~ compatible array of complex data structures)
* Accessing Online (DAQ) Data
* Online (DAQ) Data
:PROPERTIES:
:reveal_background: linear-gradient(to bottom, #27aae3, #000000)
:END:
** km3io supports the following DAQ datatypes
- ~JDAQEvent~ (the event dataformat)
- header information
@@ -138,7 +163,7 @@ print(f.timeslices)
#+RESULTS:
: Number of events: 115038
: Number of summaryslices: 182668
: Available timeslice streams: L1, SN
: Available timeslice streams: SN, L1
*** Investigating timeslice frames
@@ -158,10 +183,10 @@ print(a_timeslice.frames[806451572].pmt[:42])
#+END_SRC
#+RESULTS:
: [ 4 19 3 35 29 21 1 22 6 6 29 21 29 26 3 27 11 4 27 29 13 23 4 28
: 21 24 3 10 25 23 28 25 9 6 14 3 10 25 11 31 10 2]
: [27 27 14 14 18 22 13 13 30 30 12 10 27 13 7 7 15 15 27 11 23 23 12 12
: 18 22 29 29 21 8 1 7 9 9 6 6 23 23 25 26 10 10]
: [33 16 22 24 7 27 4 31 31 15 5 26 30 24 7 26 26 26 27 15 7 3 63 28
: 26 30 25 24 20 7 23 6 22 22 26 15 29 25 24 22 23 21]
: [ 0 9 4 12 13 2 10 10 8 9 27 27 28 18 27 2 6 15 4 2 2 12 27 3
: 15 10 23 14 19 9 9 24 24 6 7 20 7 20 27 22 24 25]
*** Checking the number of UDP packets in summary slices
@@ -189,6 +214,9 @@ print(km3io.daq.get_number_udp_packets(sumslice.dq_status))
#+end_example
* Offline (MC/reco) Data
:PROPERTIES:
:reveal_background: linear-gradient(to bottom, #e3b1e3, #000000)
:END:
** Reading offline files (aka aanet-ROOT files)
- Events
- header information
@@ -207,7 +235,7 @@ print(f)
#+END_SRC
#+RESULTS:
: <km3io.offline.OfflineReader object at 0x1155bde50>
: <km3io.offline.OfflineReader object at 0x10b267f50>
** Investigating events and tracks
#+BEGIN_SRC python :results output replace :session km3io :exports both
@@ -318,6 +346,11 @@ mask = f.mc_tracks.E.counts > 0
print(f.mc_tracks.E[mask, 0])
#+END_SRC
#+RESULTS:
: Number of events: 12236
: [11 2 3 ... 10 1 4]
: [17.72 73.213 10884.78 1694.332 1221.061 22945.123 11019.418 ...]
* ORCA DU4 RBR Analysis Example
** A tiny function to extract track attributes from a list of files
@@ -366,8 +399,7 @@ ax.legend(); ax.grid();
* Command line tool(s)
- We are working on some counter parts to the Jpp tools
- ~KPrintTree -f FILENAME~
- similar to ~JPrintTree~
- ~KPrintTree -f FILENAME~ (similar to ~JPrintTree~)
- more to come (feel free to request or contribute)
* Thanks
Loading