Skip to content
Snippets Groups Projects
README.rst 8.91 KiB

The km3io Python package

https://api.codacy.com/project/badge/Grade/0660338483874475ba04f324de2123ec

This software provides a set of Python classes to read KM3NeT ROOT files without having ROOT, Jpp or aanet installed. It only depends on Python 3.5+ and the amazing uproot package and gives you access to the data via numpy arrays.

It's very easy to use and according to the uproot benchmarks, it is able to outperform the ROOT I/O performance.

Note: Beware that this package is in the development phase, so the API will change until version 1.0.0 is released!

Installation

Install km3io using pip:

pip install km3io

To get the latest (stable) development release:

pip install git+https://git.km3net.de/km3py/km3io.git

Reminder: km3io is not dependent on aanet, ROOT or Jpp!

Questions

If you have a question about km3io, please proceed as follows:

  • Read the documentation below.
  • Explore the examples in the documentation.
  • Haven't you found an answer to your question in the documentation, post a git issue with your question showing us an example of what you have tried first, and what you would like to do.
  • Have you noticed a bug, please post it in a git issue, we appreciate your contribution.

Tutorial

Table of contents:

Introduction

Most of km3net data is stored in root files. These root files are either created with Jpp or aanet software. A root file created with Jpp is often referred to as "a Jpp root file". Similarly, a root file created with aanet is often referred to as "an aanet file". In km3io, an aanet root file will always be reffered to as an offline file, while a Jpp root file will always be referred to as a daq file.

km3io is a Python package that provides a set of classes (DaqReader and OfflineReader) to read both daq root files and offline root files without any dependency to aanet, Jpp or ROOT.

Data in km3io is often returned as a "lazyarray", a "jagged lazyarray", "a jagged array" or a Numpy array. A lazyarray is an array-like object that reads data on demand! In a lazyarray, only the first and the last chunks of data are read in memory. A lazyarray can be used with all Numpy's universal functions. Here is how a lazyarray looks like:

# <ChunkedArray [5971 5971 5971 ... 5971 5971 5971] at 0x7fb2341ad810>

A jagged array, is a 2+ dimentional array with different arrays lengths. In other words, a jagged array is an array of arrays of different sizes. So a jagged lazyarray is simply a jagged array of lazyarrays with different sizes. Here is how a jagged lazyarray looks like:

# <JaggedArray [[102 102 102 ... 11517 11518 11518] [] [101 101 102 ... 11518 11518 11518] ... [101 101 102 ... 11516 11516 11517] [] [101 101 101 ... 11517 11517 11518]] at 0x7f74b0ef8810>

Overview of daq files

# info needed here

Overview of offline files

# info needed here

Daq files reader

# an update is needed here?

Currently only events (the KM3NET_EVENT tree) are supported but timeslices and summaryslices will be implemented very soon.

Let's have a look at some ORCA data (KM3NeT_00000044_00005404.root)

To get a lazy ragged array of the events:

import km3io as ki
events = ki.JppReader("KM3NeT_00000044_00005404.root").events

That's it! Now let's have a look at the hits data:

events
# Number of events: 17023
events[23].snapshot_hits.tot
# array([28, 22, 17, 29,  5, 27, 24, 26, 21, 28, 26, 21, 26, 24, 17, 28, 23,29, 27, 24, 23, 26, 29, 25, 18, 28, 24, 28, 26, 20, 25, 31, 28, 23, 26, 21, 30, 33, 27, 16, 23, 24, 19, 24, 27, 22, 23, 21, 25, 16, 28, 22, 22, 29, 24, 29, 24, 24, 25, 25, 21, 31, 26, 28, 30, 42, 28], dtype=uint8)

Offline files reader

Let's have a look at some muons data from ORCA 4 lines simulations - run id 5971 (datav6.0test.jchain.aanet.00005971.root).

To get a lazy ragged array of all data:

>>> import km3io as ki
>>> reader = ki.AanetReader('datav6.0test.jchain.aanet.00005971.root')

That's it! Now let's take a look at all the available branches in our file:

>>> reader
Number of events: 145028
Events keys are:
  id
  det_id
  mc_id
  run_id
  mc_run_id
  frame_index
  trigger_mask
  trigger_counter
  overlays
  hits
  trks
  w
  w2list
  w3list
  mc_t
  mc_hits
  mc_trks
  comment
  index
  flags
  t.fSec
  t.fNanoSec
Hits keys are:
  hits.id
  hits.dom_id
  hits.channel_id
  hits.tdc
  hits.tot
  hits.trig
  hits.pmt_id
  hits.t
  hits.a
  hits.pos.x
  hits.pos.y
  hits.pos.z
  hits.dir.x
  hits.dir.y
  hits.dir.z
  hits.pure_t
  hits.pure_a
  hits.type
  hits.origin
  hits.pattern_flags
Tracks keys are:
  trks.fUniqueID
  trks.fBits
  trks.usr_data
  trks.usr_names
  trks.id
  trks.pos.x
  trks.pos.y
  trks.pos.z
  trks.dir.x
  trks.dir.y
  trks.dir.z
  trks.t
  trks.E
  trks.len
  trks.lik
  trks.type
  trks.rec_type
  trks.rec_stages
  trks.status
  trks.mother_id
  trks.fitinf
  trks.hit_ids
  trks.error_matrix
  trks.comment

Now that you have seen all the available branches, you can choose any key from the above (key refers to a branch name) and display the corresponding data. For example, we will check that we are indeed reading data from the run 5971:

>>> reader['run_id']
<ChunkedArray [5971 5971 5971 ... 5971 5971 5971] at 0x7fb2341ad810>

Let's look at the number of hits and tracks in the event number 5:

>>> reader[5]['hits']
60
>>> reader[5]['trks']
56

So event 5 has exactly 60 hits and 56 tracks. Let's explore in more details hits and tracks data in event 5:

>>> reader['hits.dom_id'][5]
array([806455814, 806487219, 806487219, 806487219, 806487226, 808432835,
   808432835, 808432835, 808432835, 808432835, 808432835, 808432835,
   808451904, 808451904, 808451907, 808451907, 808469129, 808469129,
   808469129, 808493910, 808949744, 808949744, 808951460, 808951460,
   808956908, 808961655, 808964908, 808969848, 808969857, 808972593,
   808972593, 808972598, 808972598, 808972698, 808972698, 808974758,
   808974811, 808976377, 808981510, 808981523, 808981812, 808982005,
   808982005, 808982018, 808982077, 808982077, 808982547, 809007627,
   809521500, 809521500, 809521500, 809524432, 809526097, 809526097,
   809526097, 809526097, 809526097, 809526097, 809526097, 809544058],
  dtype=int32)

One can access the dom_id for the first hit in event 5 as follows:

>>> reader['hits.dom_id'][5][0]
806455814

Now let's read tracks data in event 5:

>>> reader['trks.dir.z'][5]
array([-0.60246049, -0.60246049, -0.60246049, -0.51420541, -0.5475772 ,
   -0.5772408 , -0.56068238, -0.64907684, -0.67781799, -0.66565114,
   -0.63014839, -0.64566464, -0.62691012, -0.58465493, -0.59287533,
   -0.63655091, -0.63771247, -0.73446841, -0.7456636 , -0.70941246,
   -0.66312268, -0.66312268, -0.56806477, -0.56806477, -0.66312268,
   -0.66312268, -0.74851077, -0.74851077, -0.66312268, -0.74851077,
   -0.56806477, -0.74851077, -0.66312268, -0.74851077, -0.56806477,
   -0.66312268, -0.56806477, -0.66312268, -0.56806477, -0.56806477,
   -0.66312268, -0.74851077, -0.66312268, -0.93501626, -0.56806477,
   -0.74851077, -0.66312268, -0.56806477, -0.82298389, -0.74851077,
   -0.66312268, -0.56806477, -0.82298389, -0.56806477, -0.66312268,
   -0.97094183])

One can access the 'trks.dir.z' for the first track in event 5 as follows:

>>> reader['trks.dir.z'][5][0]
-0.60246049