Skip to content
Snippets Groups Projects
Commit a4ae2067 authored by ViaFerrata's avatar ViaFerrata
Browse files

Update docs

parent 3f772963
No related branches found
No related tags found
No related merge requests found
docs/_static/orcasong_logo_small_low_res.png

41.9 KiB

.wy-nav-content {
max-width: none;
}
.section #basic-2-flip-flop-synchronizer{
text-align:justify;
}
body {
max-width: 1000px !important;
margin: 0;
padding-left: 1em;
}
\ No newline at end of file
......@@ -198,3 +198,5 @@ epub_exclude_files = ['search.html']
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True
def setup(app):
app.add_stylesheet('_static/style.css')
\ No newline at end of file
Getting started with OrcaSong
=============================
.. contents:: :local:
Introduction
------------
On this page, you can find a step by step introduction into the usage of OrcaSong.
The guide starts with some exemplary root simulation files made with jpp and ends with hdf5 event 'images' that can be used for deep neural networks.
Preprocessing
-------------
Let's suppose you have some KM3NeT simulation files in the ROOT dataformat, e.g.::
/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root
The file above contains simulated charged-current muon neutrinos from the official 2016 23m ORCA production.
Now, we want to produce neutrino event images based on this data using OrcaSong.
Conversion from .root to .h5
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At first, we have to convert the files from the .root dataformat to a more usable one: hdf5.
For this purpose, we can use a tool called :code:`tohdf5` which is contained in the collaboration framework :code:`km3pipe`.
In order to use :code:`tohdf5`, you need to have loaded a jpp version first. A ready to use bash script for doing this can be found at::
/sps/km3net/users/mmoser/setenvAA_jpp9_cent_os7.sh
Then, the usage of :code:`tohdf5` is quite easy::
~$: tohdf5 -o testfile.h5 /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root
++ tohdf5: Converting '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root'...
Pipeline and module initialisation took 0.002s (CPU 0.000s).
loading root.... /afs/.in2p3.fr/system/amd64_sl7/usr/local/root/v5.34.23/
loading aalib... /pbs/throng/km3net/src/Jpp/v9.0.8454//externals/aanet//libaa.so
++ km3pipe.io.aanet.AanetPump: Reading metadata using 'JPrintMeta'
WARNING ++ km3pipe.io.aanet.MetaParser: Empty metadata
WARNING ++ km3pipe.io.aanet.AanetPump: No metadata found, this means no data provenance!
--------------------------[ Blob 250 ]---------------------------
--------------------------[ Blob 500 ]---------------------------
--------------------------[ Blob 750 ]---------------------------
--------------------------[ Blob 1000 ]---------------------------
--------------------------[ Blob 1250 ]---------------------------
--------------------------[ Blob 1500 ]---------------------------
--------------------------[ Blob 1750 ]---------------------------
--------------------------[ Blob 2000 ]---------------------------
--------------------------[ Blob 2250 ]---------------------------
--------------------------[ Blob 2500 ]---------------------------
--------------------------[ Blob 2750 ]---------------------------
--------------------------[ Blob 3000 ]---------------------------
--------------------------[ Blob 3250 ]---------------------------
EventFile io / wall time = 6.27259 / 73.9881 (8.47784 % spent on io.)
================================[ . ]================================
++ km3pipe.io.hdf5.HDF5Sink: HDF5 file written to: testfile.h5
============================================================
3457 cycles drained in 75.842898s (CPU 70.390000s). Memory peak: 177.71 MB
wall mean: 0.021790s medi: 0.019272s min: 0.015304s max: 2.823921s std: 0.049242s
CPU mean: 0.020330s medi: 0.020000s min: 0.010000s max: 1.030000s std: 0.018179s
++ tohdf5: File '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root' was converted.
There are also some options that can be used with :code:`tohdf5`::
~$: tohdf5 -h
Convert ROOT and EVT files to HDF5.
Usage:
tohdf5 [options] FILE...
tohdf5 (-h | --help)
tohdf5 --version
Options:
-h --help Show this screen.
--verbose Print more output.
--debug Print everything.
-n EVENTS Number of events/runs.
-o OUTFILE Output file (only if one file is converted).
-j --jppy (Jpp): Use jppy (not aanet) for Jpp readout.
--ignore-hits Don't read the hits.
-e --expected-rows NROWS Approximate number of events. Providing a
rough estimate for this (100, 1000000, ...)
will greatly improve reading/writing speed
and memory usage.
Strongly recommended if the table/array
size is >= 100 MB. [default: 10000]
-t --conv-times-to-jte Converts all MC times in the file to JTE
For now though, we will just stick to the standard conversion without any options.
After this conversion, you can investigate the data structure of the hdf5 file with the command :code:`ptdump`::
ptdump -v testfile.h5
/ (RootGroup) 'KM3NeT'
/event_info (Table(3457,), fletcher32, shuffle, zlib(5)) 'EventInfo'
description := {
"weight_w4": Float64Col(shape=(), dflt=0.0, pos=0),
"weight_w3": Float64Col(shape=(), dflt=0.0, pos=1),
"weight_w2": Float64Col(shape=(), dflt=0.0, pos=2),
"weight_w1": Float64Col(shape=(), dflt=0.0, pos=3),
"run_id": Int64Col(shape=(), dflt=0, pos=4),
"timestamp": Int64Col(shape=(), dflt=0, pos=5),
"nanoseconds": Int64Col(shape=(), dflt=0, pos=6),
"mc_time": Float64Col(shape=(), dflt=0.0, pos=7),
"event_id": Int64Col(shape=(), dflt=0, pos=8),
"mc_id": Int64Col(shape=(), dflt=0, pos=9),
"group_id": Int64Col(shape=(), dflt=0, pos=10)}
...
Hdf5 files are structured into "folders", in example the folder that is shown above is called "event_info".
The event_info is just a two dimensional numpy recarray with the shape (3457, 11), where for each event
important information is stored, e.g. the event_id or the run_id.
There is also a folder called "hits", which contains the photon hits of the detector for all events.
If you dig a little bit into the subfolders you can see that a lot of information is contained about these hits,
e.g. the hit time, but there is no XYZ position of the hits. The only information that you have is the dom_id and the
channel_id of a hit.
Calibrating the .h5 file
~~~~~~~~~~~~~~~~~~~~~~~~
In order to fix this, we can run another tool, :code:`calibrate`, that will add the pos_xyz information to the hdf5 datafile::
calibrate /sps/km3net/users/mmoser/det_files/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.detx testfile.h5
As you can see, you need a .detx geometry file for this "calibration". Typically, you can find the path of this detx
file on the wiki page of the simulation production that you are using. This calibration step is optional, since OrcaSong
can also do it on the fly, using a .detx file.
At this point, we are now ready to start using OrcaSong for the generation of event images.
Usage of OrcaSong
-----------------
After pulling the OrcaSong repo to your local harddisk you first need to install it with the provided setup.py::
~/orcasong$: pip install .
Before you can start to use OrcaSong, you need to provide a file that contains the XYZ positions of the DOMs to OrcaSong.
OrcaSong is currently producing event "images" based on a 1 DOM / XYZ-bin assumption. This image generation is done
automatically, based on the number of bins (n_bins) for each dimension XYZ that you supply as an input and based on the
DOM XYZ position file. An examplary file for the DOM positions can be found in the folder /orcasong of the OrcaSong
repo, "ORCA_Geo_115lines.txt". Currently, this file is hardcoded as an input for OrcaSong, so if you want to use another
detector geometry, you should include your .txt file in the main() function in "data_to_images.py".
You can generate this .txt file by taking the .det (not .detx!) file, e.g.::
/afs/in2p3.fr/throng/km3net/detectors/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.det
Then, you need to look for the :code:`OM_cluster_data` lines::
OM_cluster_data: 1 -86.171 116.880 196.500 -1.57080 0.00000 1.57080
OM_cluster_data: 2 -86.171 116.880 187.100 -1.57080 0.00000 1.57080
...
Here, the first column is the dom_id and the second, third and fourth column is the XYZ position.
You need to copy this information into a .txt file, such that it can be read by OrcaSong. One could automate this such
that OrcaSong looks for the correct lines in the .det file automatically, however, multiple (old) conventions exist for
the .det file structure, so it may be a bit tedious. Nevertheless, contributions are very welcome! :)
At this point, you're finally ready to use OrcaSong, it can be executed as follows::
python data_to_images.py testfile.h5
OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt.
The configuration options of OrcaSong can be found in the :code:`main()` function.
.. currentmodule:: orcasong.data_to_images
.. autosummary::
main
In the future, these configurations should probably be relocated to a dedicated .config file. Again, contributions and
thoughts are very welcome!
If anything is still unclear after this introduction just tell me in the deep_learning channel on chat.km3net.de or
write me an email at michael.m.moser@fau.de, such that I can improve this guide!
......@@ -5,10 +5,10 @@
Welcome to the documentation of OrcaSong!
=========================================
OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT.
Find more information about KM3NeT on http://www.km3net.org.
| OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT.
| Find more information about KM3NeT on http://www.km3net.org.
In this context, OrcaSong is a project that produces KM3NeT event images based on the raw detector data.
In this regard, OrcaSong is a project that produces KM3NeT event images based on the raw detector data.
This means that OrcaSong takes a datafile with (neutrino-) events and based on this data, it produces 2D/3D/4D 'images' (histograms).
Currently, only simulations with a hdf5 data format are supported as an input.
These event 'images' are required for some Deep Learning machine learning algorithms, e.g. Convolutional Neural Networks.
......@@ -18,13 +18,14 @@ As of now, only ORCA detector simulations are supported, but ARCA geometries can
The main code for generating the images is located in orcanet/data_to_images.py.
If the simulated hdf5 files are not calibrated yet, you need to specify the directory of a .detx file in 'data_to_images.py'.
This documentation is currently WIP, and as of now, it only offers an (extensive) API documentation.
As of now, the documentation contains a small introduction to get started and and a complete API documentation.
Please feel free to contact me or just open an issue on Gitlab / Github if you have any suggestions.
.. toctree::
:maxdepth: 2
:caption: Contents:
getting_started
api
......
......@@ -148,7 +148,8 @@ def calculate_bin_edges(n_bins, det_geo, fname_geo_limits, do4d):
def main(n_bins, det_geo, do2d=False, do2d_pdf=(False, 10), do3d=False, do4d=(True, 'time'), prod_ident=None,
timecut=('trigger_cluster', 'tight_1'), do_mc_hits=False, use_calibrated_file=False, data_cuts=None):
"""
Main code. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used for a CNN.
Main code with config parameters. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used
for a CNN.
Parameters
----------
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment