The km3io Python package
========================

.. image:: https://git.km3net.de/km3py/km3io/badges/master/build.svg
    :target: https://git.km3net.de/km3py/km3io/pipelines

.. image:: https://git.km3net.de/km3py/km3io/badges/master/coverage.svg
    :target: https://km3py.pages.km3net.de/km3io/coverage

.. image:: https://api.codacy.com/project/badge/Grade/0660338483874475ba04f324de2123ec
    :target: https://www.codacy.com/manual/tamasgal/km3io?utm_source=github.com&utm_medium=referral&utm_content=KM3NeT/km3io&utm_campaign=Badge_Grade

.. image:: https://examples.pages.km3net.de/km3badges/docs-latest-brightgreen.svg
    :target: https://km3py.pages.km3net.de/km3io

This software provides a set of Python classes to read KM3NeT ROOT files
without having ROOT, Jpp or aanet installed. It only depends on Python 3.5+ and the amazing `uproot <https://github.com/scikit-hep/uproot>`__ package and gives you access to the data via numpy arrays.

It's very easy to use and according to the `uproot <https://github.com/scikit-hep/uproot>`__ benchmarks, it is able to outperform the ROOT I/O performance. 

**Note:** Beware that this package is in the development phase, so the API will change until version ``1.0.0`` is released!

Installation
============

Install km3io using pip::

    pip install km3io 

To get the latest (stable) development release::

    pip install git+https://git.km3net.de/km3py/km3io.git

**Reminder:** km3io is **not** dependent on aanet, ROOT or Jpp! 

Questions
=========

If you have a question about km3io, please proceed as follows:

- Read the documentation below.
- Explore the `examples <https://km3py.pages.km3net.de/km3io/examples.html>`__ in the documentation.
- Haven't you found an answer to your question in the documentation, post a git issue with your question showing us an example of what you have tried first, and what you would like to do.
- Have you noticed a bug, please post it in a git issue, we appreciate your contribution.

Tutorial
========

**Table of contents:**

* `Introduction <#introduction>`__

  * `Overview of daq files <#overview-of-daq-files>`__

  * `Overview of offline files <#overview-of-offline-files>`__

* `DAQ files reader <#daq-files-reader>`__

* `Offline files reader <#offline-file-reader>`__

  * `reading events data <#reading-events-data>`__

  * `reading hits data <#reading-hits-data>`__

  * `reading tracks data <#reading-tracks-data>`__

  * `reading mc hits data <#reading-mc-hits-data>`__

  * `reading mc tracks data <#reading-mc-tracks-data>`__


Introduction
------------

Most of km3net data is stored in root files. These root files are either created with `Jpp <https://git.km3net.de/common/jpp>`__ or `aanet <https://git.km3net.de/common/aanet>`__ software. A root file created with 
`Jpp <https://git.km3net.de/common/jpp>`__ is often referred to as "a Jpp root file". Similarly, a root file created with `aanet <https://git.km3net.de/common/aanet>`__ is often referred to as "an aanet file". In km3io, an aanet root file will always be reffered to as an ``offline file``, while a Jpp root file will always be referred to as a ``daq file``.

km3io is a Python package that provides a set of classes (``DAQReader`` and ``OfflineReader``) to read both daq root files and offline root files without any dependency to aanet, Jpp or ROOT. 

Data in km3io is often returned as a "lazyarray", a "jagged lazyarray" or a `Numpy <https://docs.scipy.org/doc/numpy>`__ array. A lazyarray is an array-like object that reads data on demand! In a lazyarray, only the first and the last chunks of data are read in memory. A lazyarray can be used with all Numpy's universal `functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`__. Here is how a lazyarray looks like:

.. code-block:: python3

    # <ChunkedArray [5971 5971 5971 ... 5971 5971 5971] at 0x7fb2341ad810>


A jagged array, is a 2+ dimentional array with different arrays lengths. In other words, a jagged array is an array of arrays of different sizes. So a jagged lazyarray is simply a jagged array of lazyarrays with different sizes. Here is how a jagged lazyarray looks like:


.. code-block:: python3

    # <JaggedArray [[102 102 102 ... 11517 11518 11518] [] [101 101 102 ... 11518 11518 11518] ... [101 101 102 ... 11516 11516 11517] [] [101 101 101 ... 11517 11517 11518]] at 0x7f74b0ef8810>


Overview of daq files
"""""""""""""""""""""
# info needed here

Overview of offline files
"""""""""""""""""""""""""

# info needed here

DAQ files reader
----------------

# an update is needed here?

Currently only events (the ``KM3NET_EVENT`` tree) are supported but timeslices and summaryslices will be implemented very soon.

Let's have a look at some ORCA data (``KM3NeT_00000044_00005404.root``)

To get a lazy ragged array of the events:

.. code-block:: python3

  import km3io as ki
  events = ki.DAQReader("KM3NeT_00000044_00005404.root").events


That's it! Now let's have a look at the hits data:

.. code-block:: python3

  >>> events
  Number of events: 17023
  >>> events[23].snapshot_hits.tot
  array([28, 22, 17, 29,  5, 27, 24, 26, 21, 28, 26, 21, 26, 24, 17, 28, 23,29, 27, 24, 23, 26, 29, 25, 18, 28, 24, 28, 26, 20, 25, 31, 28, 23, 26, 21, 30, 33, 27, 16, 23, 24, 19, 24, 27, 22, 23, 21, 25, 16, 28, 22, 22, 29, 24, 29, 24, 24, 25, 25, 21, 31, 26, 28, 30, 42, 28], dtype=uint8)


Offline files reader
--------------------

Let's have a look at some muons data from ORCA 4 lines simulations - run id 5971 (``datav6.0test.jchain.aanet.00005971.root``). 

**Note:** this file was cropped to 10 events only, so don't be surprised in this tutorial if you see few events in the file.

First, let's read our file:

.. code-block:: python3

  >>> import km3io as ki
  >>> file = 'datav6.0test.jchain.aanet.00005971.root'
  >>> r = ki.OfflineReader(file)
  <km3io.aanet.OfflineReader at 0x7f24cc2bd550>

and that's it! Note that `file` can be either an str of your file path, or a path-like object. 

To explore all the available branches in our offline file: 

.. code-block:: python3

  >>> r.keys
  Events keys are:
        id
        det_id
        mc_id
        run_id
        mc_run_id
        frame_index
        trigger_mask
        trigger_counter
        overlays
        hits
        trks
        w
        w2list
        w3list
        mc_t
        mc_hits
        mc_trks
        comment
        index
        flags
        t.fSec
        t.fNanoSec
  Hits keys are:
        hits.id
        hits.dom_id
        hits.channel_id
        hits.tdc
        hits.tot
        hits.trig
        hits.pmt_id
        hits.t
        hits.a
        hits.pos.x
        hits.pos.y
        hits.pos.z
        hits.dir.x
        hits.dir.y
        hits.dir.z
        hits.pure_t
        hits.pure_a
        hits.type
        hits.origin
        hits.pattern_flags
  Tracks keys are:
        trks.fUniqueID
        trks.fBits
        trks.id
        trks.pos.x
        trks.pos.y
        trks.pos.z
        trks.dir.x
        trks.dir.y
        trks.dir.z
        trks.t
        trks.E
        trks.len
        trks.lik
        trks.type
        trks.rec_type
        trks.rec_stages
        trks.status
        trks.mother_id
        trks.fitinf
        trks.hit_ids
        trks.error_matrix
        trks.comment
  Mc hits keys are:
        mc_hits.id
        mc_hits.dom_id
        mc_hits.channel_id
        mc_hits.tdc
        mc_hits.tot
        mc_hits.trig
        mc_hits.pmt_id
        mc_hits.t
        mc_hits.a
        mc_hits.pos.x
        mc_hits.pos.y
        mc_hits.pos.z
        mc_hits.dir.x
        mc_hits.dir.y
        mc_hits.dir.z
        mc_hits.pure_t
        mc_hits.pure_a
        mc_hits.type
        mc_hits.origin
        mc_hits.pattern_flags
  Mc tracks keys are:
        mc_trks.fUniqueID
        mc_trks.fBits
        mc_trks.id
        mc_trks.pos.x
        mc_trks.pos.y
        mc_trks.pos.z
        mc_trks.dir.x
        mc_trks.dir.y
        mc_trks.dir.z
        mc_trks.t
        mc_trks.E
        mc_trks.len
        mc_trks.lik
        mc_trks.type
        mc_trks.rec_type
        mc_trks.rec_stages
        mc_trks.status
        mc_trks.mother_id
        mc_trks.fitinf
        mc_trks.hit_ids
        mc_trks.error_matrix
        mc_trks.comment

In an offline file, there are 5 main trees with data: 

* events tree
* hits tree
* tracks tree
* mc hits tree
* mc tracks tree

with km3io, these trees can be accessed with a simple tab completion: 

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/reader.png

In the following, we will explore each tree using km3io package. 

reading events data
"""""""""""""""""""

to read data in events tree with km3io: 

.. code-block:: python3

  >>> r.events
  <OfflineEvents: 10 parsed events>

to get the total number of events in the events tree:

.. code-block:: python3

  >>> len(r.events)
  10

the branches stored in the events tree in an offline file can be easily accessed with a tab completion as seen below:

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/events.png

to get data from the events tree, chose any branch of interest with the tab completion, the following is a non exaustive set of examples. 

to get event ids:

.. code-block:: python3

    >>> r.events.id
    <ChunkedArray [1 2 3 ... 8 9 10] at 0x7f249eeb6f10>

to get detector ids:

.. code-block:: python3

    >>> r.events.det_id
    <ChunkedArray [44 44 44 ... 44 44 44] at 0x7f249eeba050>

to get frame_index:

.. code-block:: python3

    >>> r.events.frame_index
    <ChunkedArray [182 183 202 ... 185 185 204] at 0x7f249eeba410>

to get snapshot hits:

.. code-block:: python3

    >>> r.events.hits
    <ChunkedArray [176 125 318 ... 84 255 105] at 0x7f249eebaa10>

to illustrate the strength of this data structure, we will play around with `r.events.hits` using Numpy universal `functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`__. 

.. code-block:: python3

    >>> import numpy as np
    >>> np.log(r.events.hits)
    <ChunkedArray [5.170483995038151 4.8283137373023015 5.762051382780177 ... 4.430816798843313 5.541263545158426 4.653960350157523] at 0x7f249b8ebb90>

to get all data from one specific event (for example event 0):

.. code-block:: python3

    >>> r.events[0]
    offline event:
          id                  :               1
          det_id              :              44
          mc_id               :               0
          run_id              :            5971
          mc_run_id           :               0
          frame_index         :             182
          trigger_mask        :              22
          trigger_counter     :               0
          overlays            :              60
          hits                :             176
          trks                :              56
          w                   :              []
          w2list              :              []
          w3list              :              []
          mc_t                :             0.0
          mc_hits             :               0
          mc_trks             :               0
          comment             :             b''
          index               :               0
          flags               :               0
          t_fSec              :      1567036818
          t_fNanoSec          :       200000000

to get a specific value from event 0, for example the number of overlays:

.. code-block:: python3

    >>> r.events[0].overlays
    60

or the number of hits: 

.. code-block:: python3

    >>> r.events[0].hits
    176


reading hits data
"""""""""""""""""

to read data in hits tree with km3io:

.. code-block:: python3

    >>> r.hits
    <OfflineHits: 10 parsed elements>

this shows that in our offline file, there are 10 events, with each event is associated a hits trees. 

to have access to all data in a specific branche from the hits tree, you can use the tab completion:

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/hits.png

to get ALL the dom ids in all hits trees in our offline file:

.. code-block:: python3

    >>> r.hits.dom_id
    <ChunkedArray [[806451572 806451572 806451572 ... 809544061 809544061 809544061] [806451572 806451572 806451572 ... 809524432 809526097 809544061] [806451572 806451572 806451572 ... 809544061 809544061 809544061] ... [806451572 806455814 806465101 ... 809526097 809544058 809544061] [806455814 806455814 806455814 ... 809544061 809544061 809544061] [806455814 806455814 806455814 ... 809544058 809544058 809544061]] at 0x7f249eebac50>

to get ALL the time over threshold (tot) in all hits trees in our offline file:

.. code-block:: python3

    >>> r.hits.tot
    <ChunkedArray [[24 30 22 ... 38 26 23] [29 26 22 ... 26 28 24] [27 19 13 ... 27 24 16] ... [22 22 9 ... 27 32 27] [30 32 17 ... 30 24 29] [27 41 36 ... 29 24 28]] at 0x7f249eec9050>


if you are interested in a specific event (let's say event 0), you can access the corresponding hits tree by doing the following:

.. code-block:: python3

    >>> r[0].hits
    <OfflineHits: 176 parsed elements>

notice that now there are 176 parsed elements (as opposed to 10 elements parsed when r.hits is called). This means that in event 0 there are 176 hits! To get the dom ids from this event:

.. code-block:: python3

    >>> r[0].hits.dom_id
    array([806451572, 806451572, 806451572, 806451572, 806455814, 806455814,
       806455814, 806483369, 806483369, 806483369, 806483369, 806483369,
       806483369, 806483369, 806483369, 806483369, 806483369, 806487219,
       806487226, 806487231, 806487231, 808432835, 808435278, 808435278,
       808435278, 808435278, 808435278, 808447180, 808447180, 808447180,
       808447180, 808447180, 808447180, 808447180, 808447180, 808447186,
       808451904, 808451904, 808472265, 808472265, 808472265, 808472265,
       808472265, 808472265, 808472265, 808472265, 808488895, 808488990,
       808488990, 808488990, 808488990, 808488990, 808489014, 808489014,
       808489117, 808489117, 808489117, 808489117, 808493910, 808946818,
       808949744, 808951460, 808951460, 808951460, 808951460, 808951460,
       808956908, 808956908, 808959411, 808959411, 808959411, 808961448,
       808961448, 808961504, 808961504, 808961655, 808961655, 808961655,
       808964815, 808964815, 808964852, 808964908, 808969857, 808969857,
       808969857, 808969857, 808969857, 808972593, 808972698, 808972698,
       808972698, 808974758, 808974758, 808974758, 808974758, 808974758,
       808974758, 808974758, 808974758, 808974758, 808974758, 808974758,
       808974773, 808974773, 808974773, 808974773, 808974773, 808974972,
       808974972, 808976377, 808976377, 808976377, 808979567, 808979567,
       808979567, 808979721, 808979721, 808979721, 808979721, 808979721,
       808979721, 808979721, 808979729, 808979729, 808979729, 808981510,
       808981510, 808981510, 808981510, 808981672, 808981672, 808981672,
       808981672, 808981672, 808981672, 808981672, 808981672, 808981672,
       808981672, 808981672, 808981672, 808981672, 808981672, 808981672,
       808981672, 808981672, 808981812, 808981812, 808981812, 808981864,
       808981864, 808982005, 808982005, 808982005, 808982018, 808982018,
       808982018, 808982041, 808982041, 808982077, 808982077, 808982547,
       808982547, 808982547, 808997793, 809006037, 809524432, 809526097,
       809526097, 809544061, 809544061, 809544061, 809544061, 809544061,
       809544061, 809544061], dtype=int32

to get all data of a specific hit (let's say hit 0) from event 0:

.. code-block:: python3

    >>>r[0].hits[0]
    offline hit:
          id                  :               0
          dom_id              :       806451572
          channel_id          :               8
          tdc                 :               0
          tot                 :              24
          trig                :               1
          pmt_id              :               0
          t                   :      70104010.0
          a                   :             0.0
          pos_x               :             0.0
          pos_y               :             0.0
          pos_z               :             0.0
          dir_x               :             0.0
          dir_y               :             0.0
          dir_z               :             0.0
          pure_t              :             0.0
          pure_a              :             0.0
          type                :               0
          origin              :               0
          pattern_flags       :               0

to get a specific value from hit 0 in event 0, let's say for example the dom id:

.. code-block:: python3

    >>>r[0].hits[0].dom_id
    806451572

reading tracks data
"""""""""""""""""""

to read data in tracks tree with km3io:

.. code-block:: python3

    >>> r.tracks
    <OfflineTracks: 10 parsed elements>

this shows that in our offline file, there are 10 parsed elements (events), each event is associated with tracks data. 

to have access to all data in a specific branche from the tracks tree, you can use the tab completion:

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/tracks.png

to get ALL the cos(zenith angle) in all tracks tree in our offline file:

.. code-block:: python3

    >>> r.tracks.dir_z
    <ChunkedArray [[-0.872885221293917 -0.872885221293917 -0.872885221293917 ... -0.6631226836266504 -0.5680647731737454 -0.5680647731737454] [-0.8351996698137462 -0.8351996698137462 -0.8351996698137462 ... -0.7485107718446855 -0.8229838871876581 -0.239315690284641] [-0.989148723802379 -0.989148723802379 -0.989148723802379 ... -0.9350162572437829 -0.88545604390297 -0.88545604390297] ... [-0.5704611045902105 -0.5704611045902105 -0.5704611045902105 ... -0.9350162572437829 -0.4647231989130516 -0.4647231989130516] [-0.9779941383490359 -0.9779941383490359 -0.9779941383490359 ... -0.88545604390297 -0.88545604390297 -0.8229838871876581] [-0.7396916780974963 -0.7396916780974963 -0.7396916780974963 ... -0.6631226836266504 -0.7485107718446855 -0.7485107718446855]] at 0x7f249eed2090>

to get ALL the tracks likelihood in our offline file:

.. code-block:: python3

    >>> r.tracks.lik
    <ChunkedArray [[294.6407542676734 294.6407542676734 294.6407542676734 ... 67.81221253265059 67.7756405143316 67.77250505700384] [96.75133289411137 96.75133289411137 96.75133289411137 ... 39.21916536442286 39.184645826013806 38.870325146341884] [560.2775306614813 560.2775306614813 560.2775306614813 ... 118.88577278801066 118.72271313687405 117.80785995187605] ... [71.03251451148226 71.03251451148226 71.03251451148226 ... 16.714140573909347 16.444395245214945 16.34639241716669] [326.440133294878 326.440133294878 326.440133294878 ... 87.79818671079849 87.75488082571873 87.74839444768625] [159.77779654216795 159.77779654216795 159.77779654216795 ... 33.8669134999348 33.821631538334984 33.77240735670646]] at 0x7f249eed2590>


if you are interested in a specific event (let's say event 0), you can access the corresponding tracks tree by doing the following:

.. code-block:: python3

    >>> r[0].tracks
    <OfflineTracks: 56 parsed elements>

notice that now there are 56 parsed elements (as opposed to 10 elements parsed when r.tracks is called). This means that in event 0 there is data about 56 possible tracks! To get the tracks likelihood from this event:

.. code-block:: python3

    >>> r[0].tracks.lik
    array([294.64075427, 294.64075427, 294.64075427, 291.64653113,
       291.27392663, 290.69031512, 289.19290546, 289.08449217,
       289.03373947, 288.19030836, 282.92343367, 282.71527118,
       282.10762402, 280.20553861, 275.93183966, 273.01809111,
       257.46433694, 220.94357656, 194.99426403, 190.47809685,
        79.95235686,  78.94389763,  78.90791169,  77.96122466,
        77.9579604 ,  76.90769883,  75.97546175,  74.91530508,
        74.9059469 ,  72.94007716,  72.90467038,  72.8629316 ,
        72.81280833,  72.80229533,  72.78899435,  71.82404165,
        71.80085542,  71.71028058,  70.91130096,  70.89150223,
        70.85845637,  70.79081796,  70.76929743,  69.80667603,
        69.64058976,  68.93085058,  68.84304037,  68.83154232,
        68.79944298,  68.79019375,  68.78581291,  68.72340328,
        67.86628937,  67.81221253,  67.77564051,  67.77250506])

to get all data of a specific track (let's say track 0) from event 0:

.. code-block:: python3

    >>>r[0].tracks[0]
    offline track:
          fUniqueID                      :                           0
          fBits                          :                    33554432
          id                             :                           1
          pos_x                          :            445.835395997812
          pos_y                          :           615.1089636184813
          pos_z                          :           125.1448339836911
          dir_x                          :          0.0368711082700674
          dir_y                          :        -0.48653048395923415
          dir_z                          :          -0.872885221293917
          t                              :           70311446.46401498
          E                              :           99.10458562488608
          len                            :                         0.0
          lik                            :           294.6407542676734
          type                           :                           0
          rec_type                       :                        4000
          rec_stages                     :                [1, 3, 5, 4]
          status                         :                           0
          mother_id                      :                          -1
          hit_ids                        :                          []
          error_matrix                   :                          []
          comment                        :                           0
          JGANDALF_BETA0_RAD             :        0.004957442219414389
          JGANDALF_BETA1_RAD             :        0.003417848024252858
          JGANDALF_CHI2                  :          -294.6407542676734
          JGANDALF_NUMBER_OF_HITS        :                       142.0
          JENERGY_ENERGY                 :           99.10458562488608
          JENERGY_CHI2                   :     1.7976931348623157e+308
          JGANDALF_LAMBDA                :      4.2409761837248484e-12
          JGANDALF_NUMBER_OF_ITERATIONS  :                        10.0
          JSTART_NPE_MIP                 :           24.88469697331908
          JSTART_NPE_MIP_TOTAL           :           55.88169412579765
          JSTART_LENGTH_METRES           :           98.89582506402911
          JVETO_NPE                      :                         0.0
          JVETO_NUMBER_OF_HITS           :                         0.0
          JENERGY_MUON_RANGE_METRES      :           344.9767431592819
          JENERGY_NOISE_LIKELIHOOD       :         -333.87773581129136
          JENERGY_NDF                    :                      1471.0
          JENERGY_NUMBER_OF_HITS         :                       101.0

to get a specific value from track 0 in event 0, let's say for example the liklihood:

.. code-block:: python3

    >>>r[0].tracks[0].lik
    294.6407542676734


reading mc hits data
""""""""""""""""""""

to read mc hits data:

.. code-block:: python3

    >>>r.mc_hits
    <OfflineHits: 10 parsed elements>

that's it! All branches in mc hits tree can be accessed in the exact same way described in the section `reading hits data <#reading-hits-data>`__ . All data is easily accesible and if you are stuck, hit tab key to see all the available branches:

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/mc_hits.png

reading mc tracks data
""""""""""""""""""""""

to read mc tracks data:

.. code-block:: python3

    >>>r.mc_tracks
    <OfflineTracks: 10 parsed elements>

that's it! All branches in mc tracks tree can be accessed in the exact same way described in the section `reading tracks data <#reading-tracks-data>`__ . All data is easily accesible and if you are stuck, hit tab key to see all the available branches:

.. image:: https://git.km3net.de/km3py/km3io/raw/master/examples/pictures/mc_tracks.png