@@ -43,169 +43,133 @@ If you have a question about km3io, please proceed as follows:
- Haven't you found an answer to your question in the documentation, post a git issue with your question showing us an example of what you have tried first, and what you would like to do.
- Have you noticed a bug, please post it in a git issue, we appreciate your contribution.
Tutorial
========
**Table of contents:**
Introduction
------------
* `Introduction <#introduction>`__
Most of km3net data is stored in root files. These root files are created using the `KM3NeT Dataformat library <https://git.km3net.de/common/km3net-dataformat>`__
A ROOT file created with
`Jpp <https://git.km3net.de/common/jpp>`__ is an "online" file and all other software usually produces "offline" files.
* `Overview of online files <#overview-of-online-files>`__
km3io is a Python package that provides a set of classes: ``OnlineReader``, ``OfflineReader`` and a special class to read gSeaGen files. All of these ROOT files can be read installing any other software like Jpp, aanet or ROOT.
* `Overview of offline files <#overview-of-offline-files>`__
Data in km3io is returned as ``awkward.Array`` which is an advance Numpy-like container type to store
contiguous data for high performance computations.
Such an ``awkward.Array`` supports any level of nested arrays and records which can have different lengths, in contrast to Numpy where everything has to be rectangular.
* `Online files reader <#online-files-reader>`__
The example is shown below shows the array which contains the ``dir_z`` values
of each track of the first 4 events. The type ``4 * var * float64`` means that
it has 4 subarrays with variable lengths of type ``float64``:
The same concept applies to everything, including ``hits``, ``mc_hits``,
``mc_tracks``, ``t_sec`` etc.
* `Offline files reader <#offline-file-reader>`__
Offline files reader
--------------------
* `reading events data <#reading-events-data>`__
In general an offline file has two methods to fetch data: the header and the events. Let's start with the header.
* `reading usr data of events <#reading-usr-data-of-events>`__
Reading the file header
"""""""""""""""""""""""
* `reading hits data <#reading-hits-data>`__
To read an offline file start with opening it with an OfflineReader:
* `reading tracks data <#reading-tracks-data>`__
.. code-block:: python3
* `reading mc hits data <#reading-mc-hits-data>`__
>>> import km3io
>>> from km3net_testdata import data_path
>>> f = km3io.OfflineReader(data_path("offline/numucc.root"))
* `reading mc tracks data <#reading-mc-tracks-data>`__
Calling the header can be done with:
.. code-block:: python3
>>> f.header
<km3io.offline.Header at 0x7fcd81025990>
Introduction
------------
and provides lazy access. In offline files the header is unique and can be printed
Most of km3net data is stored in root files. These root files are either created with `Jpp <https://git.km3net.de/common/jpp>`__ or `aanet <https://git.km3net.de/common/aanet>`__ software. A root file created with
`Jpp <https://git.km3net.de/common/jpp>`__ is often referred to as "a Jpp root file". Similarly, a root file created with `aanet <https://git.km3net.de/common/aanet>`__ is often referred to as "an aanet file". In km3io, an aanet root file will always be reffered to as an ``offline file``, while a Jpp ROOT file will always be referred to as a ``online file``.
.. code-block:: python3
km3io is a Python package that provides a set of classes (``OnlineReader`` and ``OfflineReader``) to read both online ROOT files and offline ROOT files without any dependency to aanet, Jpp or ROOT.
Data in km3io is often returned as a "lazyarray", a "jagged lazyarray" or a `Numpy <https://docs.scipy.org/doc/numpy>`__ array. A lazyarray is an array-like object that reads data on demand! In a lazyarray, only the first and the last chunks of data are read in memory. A lazyarray can be used with all Numpy's universal `functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`__. Here is how a lazyarray looks like:
To read the values in the header one can call them directly:
A jagged array, is a 2+ dimentional array with different arrays lengths. In other words, a jagged array is an array of arrays of different sizes. So a jagged lazyarray is simply a jagged array of lazyarrays with different sizes. Here is how a jagged lazyarray looks like:
Reading events
""""""""""""""
To start reading events call the events method on the file:
Online files are written by the DataWriter (part of Jpp) and contain events, timeslices and summary slices.
Overview of offline files
"""""""""""""""""""""""""
Offline files contain data about events, hits and tracks. Based on aanet version 2.0.0 documentation, the following tables show the definitions, the types and the units of the branches founds in the events, hits and tracks trees. A description of the file header are also displayed.
.. csv-table:: events keys definitions and units
:header: "type", "name", "definition"
:widths: 20, 20, 80
"int", "id", "offline event identifier"
"int", "det_id", "detector identifier from DAQ"
"int", "mc_id", "identifier of the MC event (as found in ascii or antcc file)"
"int", "run_id", "DAQ run identifier"
"int", "mc_run_id", "MC run identifier"
"int", "frame_index", "from the raw data"
"ULong64_t", "trigger_mask", "trigger mask from raw data (i.e. the trigger bits)"
"ULong64_t", "trigger_counter", "trigger counter"
"unsigned int", "overlays", "number of overlaying triggered events"
"TTimeStamp", "t", "UTC time of the start of the timeslice the event came from"
"vec Hit", "hits", "list of hits"
"vec Trk", "trks", "list of reconstructed tracks (can be several because of prefits,showers, etc)"
Like the online reader lazy access is used. Using <TAB> completion gives an overview of available data. Alternatively the method `keys` can be used on events and it's data members containing a structure to see what is available for reading.
Reading the reconstructed values like energy and direction of an event can be done with:
An overview of the values in a the header are given in the `Overview of offline files <#overview-of-offline-files>`__.
To read the values in the header one can call them directly:
.. code-block:: python3
>>> f.header.DAQ.livetime
35.5
>>> f.header.cut_nu.Emin
1
>>> f.header.genvol.numberOfEvents
1000000.0
Reading events
""""""""""""""
To start reading events call the events method on the file:
.. code-block:: python3
>>> f.events
<OfflineBranch[events]: 355 elements>
Like the online reader lazy access is used. Using <TAB> completion gives an overview of available data. Alternatively the method `keys` can be used on events and it's data members containing a structure to see what is available for reading.
Since reconstruction stages can be done multiple times and events can have multiple reconstructions, the vectors of reconstructed values can have variable length. Other data members like the header are always the same size. The definitions of data members can be found in the `definitions <https://git.km3net.de/km3py/km3io/-/tree/master/km3io/definitions>`__ folder. The definitions contain fit parameters, header information, reconstruction information, generator output and can be expaneded to include more.
To use the definitions imagine the following: the user wants to read out the MC value of the Bjorken-Y of event 12 that was generated with gSeaGen. This can be found in the `gSeaGen definitions <https://git.km3net.de/km3py/km3io/-/blob/master/km3io/definitions/w2list_gseagen.py>`__: `"W2LIST_GSEAGEN_BY": 8,`
This value is saved into `w2list`, so if an event is generated with gSeaGen the value can be fetched like:
.. code-block:: python3
>>> f.events.w2list[12][8]
0.393755
Note that w2list can also contain other values if the event is generated with another generator.