Update docs

a4ae2067 · ViaFerrata · 3f772963 · a4ae2067 · a4ae2067 · a4ae2067
Commit a4ae2067 authored 6 years ago by ViaFerrata
--- a/docs/_static/orcasong_logo_small_low_res.png
+++ b/docs/_static/orcasong_logo_small_low_res.png
--- a/docs/_static/style.css
+++ b/docs/_static/style.css
 .wy-nav-content {
    max-width: none;
 }
+
+ .section #basic-2-flip-flop-synchronizer{
+     text-align:justify;
+}
+
+ body {
+    max-width: 1000px !important;
+    margin: 0;
+    padding-left: 1em;
+}
\ No newline at end of file
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -198,3 +198,5 @@ epub_exclude_files = ['search.html']

 # If true, `todo` and `todoList` produce output, else they produce nothing.
 todo_include_todos = True
+def setup(app):
+    app.add_stylesheet('_static/style.css')
\ No newline at end of file
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
+Getting started with OrcaSong
+=============================
+
+.. contents:: :local:
+
+Introduction
+------------
+
+On this page, you can find a step by step introduction into the usage of OrcaSong.
+The guide starts with some exemplary root simulation files made with jpp and ends with hdf5 event 'images' that can be used for deep neural networks.
+
+Preprocessing
+-------------
+
+Let's suppose you have some KM3NeT simulation files in the ROOT dataformat, e.g.::
+
+    /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root
+
+The file above contains simulated charged-current muon neutrinos from the official 2016 23m ORCA production.
+Now, we want to produce neutrino event images based on this data using OrcaSong.
+
+Conversion from .root to .h5
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At first, we have to convert the files from the .root dataformat to a more usable one: hdf5.
+For this purpose, we can use a tool called :code:`tohdf5` which is contained in the collaboration framework :code:`km3pipe`.
+In order to use :code:`tohdf5`, you need to have loaded a jpp version first. A ready to use bash script for doing this can be found at::
+
+    /sps/km3net/users/mmoser/setenvAA_jpp9_cent_os7.sh
+
+
+Then, the usage of :code:`tohdf5` is quite easy::
+
+    ~$: tohdf5 -o testfile.h5 /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root
+    ++ tohdf5: Converting '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root'...
+    Pipeline and module initialisation took 0.002s (CPU 0.000s).
+    loading root....  /afs/.in2p3.fr/system/amd64_sl7/usr/local/root/v5.34.23/
+    loading aalib...  /pbs/throng/km3net/src/Jpp/v9.0.8454//externals/aanet//libaa.so
+    ++ km3pipe.io.aanet.AanetPump: Reading metadata using 'JPrintMeta'
+    WARNING ++ km3pipe.io.aanet.MetaParser: Empty metadata
+    WARNING ++ km3pipe.io.aanet.AanetPump: No metadata found, this means no data provenance!
+    --------------------------[ Blob     250 ]---------------------------
+    --------------------------[ Blob     500 ]---------------------------
+    --------------------------[ Blob     750 ]---------------------------
+    --------------------------[ Blob    1000 ]---------------------------
+    --------------------------[ Blob    1250 ]---------------------------
+    --------------------------[ Blob    1500 ]---------------------------
+    --------------------------[ Blob    1750 ]---------------------------
+    --------------------------[ Blob    2000 ]---------------------------
+    --------------------------[ Blob    2250 ]---------------------------
+    --------------------------[ Blob    2500 ]---------------------------
+    --------------------------[ Blob    2750 ]---------------------------
+    --------------------------[ Blob    3000 ]---------------------------
+    --------------------------[ Blob    3250 ]---------------------------
+    EventFile io / wall time = 6.27259 / 73.9881 (8.47784 % spent on io.)
+    ================================[ . ]================================
+    ++ km3pipe.io.hdf5.HDF5Sink: HDF5 file written to: testfile.h5
+    ============================================================
+    3457 cycles drained in 75.842898s (CPU 70.390000s). Memory peak: 177.71 MB
+      wall  mean: 0.021790s  medi: 0.019272s  min: 0.015304s  max: 2.823921s  std: 0.049242s
+      CPU   mean: 0.020330s  medi: 0.020000s  min: 0.010000s  max: 1.030000s  std: 0.018179s
+    ++ tohdf5: File '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root' was converted.
+
+There are also some options that can be used with :code:`tohdf5`::
+
+    ~$: tohdf5 -h
+    Convert ROOT and EVT files to HDF5.
+
+    Usage:
+        tohdf5 [options] FILE...
+        tohdf5 (-h | --help)
+        tohdf5 --version
+
+    Options:
+        -h --help                       Show this screen.
+        --verbose                       Print more output.
+        --debug                         Print everything.
+        -n EVENTS                       Number of events/runs.
+        -o OUTFILE                      Output file (only if one file is converted).
+        -j --jppy                       (Jpp): Use jppy (not aanet) for Jpp readout.
+        --ignore-hits                   Don't read the hits.
+        -e --expected-rows NROWS        Approximate number of events.  Providing a
+                                        rough estimate for this (100, 1000000, ...)
+                                        will greatly improve reading/writing speed
+                                        and memory usage.
+                                        Strongly recommended if the table/array
+                                        size is >= 100 MB. [default: 10000]
+        -t --conv-times-to-jte          Converts all MC times in the file to JTE
+
+For now though, we will just stick to the standard conversion without any options.
+
+After this conversion, you can investigate the data structure of the hdf5 file with the command :code:`ptdump`::
+
+    ptdump -v testfile.h5
+    / (RootGroup) 'KM3NeT'
+    /event_info (Table(3457,), fletcher32, shuffle, zlib(5)) 'EventInfo'
+      description := {
+      "weight_w4": Float64Col(shape=(), dflt=0.0, pos=0),
+      "weight_w3": Float64Col(shape=(), dflt=0.0, pos=1),
+      "weight_w2": Float64Col(shape=(), dflt=0.0, pos=2),
+      "weight_w1": Float64Col(shape=(), dflt=0.0, pos=3),
+      "run_id": Int64Col(shape=(), dflt=0, pos=4),
+      "timestamp": Int64Col(shape=(), dflt=0, pos=5),
+      "nanoseconds": Int64Col(shape=(), dflt=0, pos=6),
+      "mc_time": Float64Col(shape=(), dflt=0.0, pos=7),
+      "event_id": Int64Col(shape=(), dflt=0, pos=8),
+      "mc_id": Int64Col(shape=(), dflt=0, pos=9),
+      "group_id": Int64Col(shape=(), dflt=0, pos=10)}
+    ...
+
+Hdf5 files are structured into "folders", in example the folder that is shown above is called "event_info".
+The event_info is just a two dimensional numpy recarray with the shape (3457, 11), where for each event
+important information is stored, e.g. the event_id or the run_id.
+
+There is also a folder called "hits", which contains the photon hits of the detector for all events.
+If you dig a little bit into the subfolders you can see that a lot of information is contained about these hits,
+e.g. the hit time, but there is no XYZ position of the hits. The only information that you have is the dom_id and the
+channel_id of a hit.
+
+Calibrating the .h5 file
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+In order to fix this, we can run another tool, :code:`calibrate`, that will add the pos_xyz information to the hdf5 datafile::
+
+    calibrate /sps/km3net/users/mmoser/det_files/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.detx testfile.h5
+
+As you can see, you need a .detx geometry file for this "calibration". Typically, you can find the path of this detx
+file on the wiki page of the simulation production that you are using. This calibration step is optional, since OrcaSong
+can also do it on the fly, using a .detx file.
+
+At this point, we are now ready to start using OrcaSong for the generation of event images.
+
+
+Usage of OrcaSong
+-----------------
+
+After pulling the OrcaSong repo to your local harddisk you first need to install it with the provided setup.py::
+
+    ~/orcasong$: pip install .
+
+Before you can start to use OrcaSong, you need to provide a file that contains the XYZ positions of the DOMs to OrcaSong.
+OrcaSong is currently producing event "images" based on a 1 DOM / XYZ-bin assumption. This image generation is done
+automatically, based on the number of bins (n_bins) for each dimension XYZ that you supply as an input and based on the
+DOM XYZ position file. An examplary file for the DOM positions can be found in the folder /orcasong of the OrcaSong
+repo, "ORCA_Geo_115lines.txt". Currently, this file is hardcoded as an input for OrcaSong, so if you want to use another
+detector geometry, you should include your .txt file in the main() function in "data_to_images.py".
+You can generate this .txt file by taking the .det (not .detx!) file, e.g.::
+
+    /afs/in2p3.fr/throng/km3net/detectors/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.det
+
+Then, you need to look for the :code:`OM_cluster_data` lines::
+
+    OM_cluster_data: 1  -86.171 116.880 196.500 -1.57080 0.00000 1.57080
+    OM_cluster_data: 2  -86.171 116.880 187.100 -1.57080 0.00000 1.57080
+    ...
+
+Here, the first column is the dom_id and the second, third and fourth column is the XYZ position.
+You need to copy this information into a .txt file, such that it can be read by OrcaSong. One could automate this such
+that OrcaSong looks for the correct lines in the .det file automatically, however, multiple (old) conventions exist for
+the .det file structure, so it may be a bit tedious. Nevertheless, contributions are very welcome! :)
+
+At this point, you're finally ready to use OrcaSong, it can be executed as follows::
+
+    python data_to_images.py testfile.h5
+
+OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt.
+
+The configuration options of OrcaSong can be found in the :code:`main()` function.
+
+.. currentmodule:: orcasong.data_to_images
+.. autosummary::
+    main
+
+In the future, these configurations should probably be relocated to a dedicated .config file. Again, contributions and
+thoughts are very welcome!
+If anything is still unclear after this introduction just tell me in the deep_learning channel on chat.km3net.de or
+write me an email at michael.m.moser@fau.de, such that I can improve this guide!
+
+
+
+
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -5,10 +5,10 @@
 Welcome to the documentation of OrcaSong!
 =========================================

-OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT.
-Find more information about KM3NeT on http://www.km3net.org.
+| OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT.
+| Find more information about KM3NeT on http://www.km3net.org.

-In this context, OrcaSong is a project that produces KM3NeT event images based on the raw detector data.
+In this regard, OrcaSong is a project that produces KM3NeT event images based on the raw detector data.
 This means that OrcaSong takes a datafile with (neutrino-) events and based on this data, it produces 2D/3D/4D 'images' (histograms).
 Currently, only simulations with a hdf5 data format are supported as an input.
 These event 'images' are required for some Deep Learning machine learning algorithms, e.g. Convolutional Neural Networks.
@@ -18,13 +18,14 @@ As of now, only ORCA detector simulations are supported, but ARCA geometries can
 The main code for generating the images is located in orcanet/data_to_images.py.
 If the simulated hdf5 files are not calibrated yet, you need to specify the directory of a .detx file in 'data_to_images.py'.

-This documentation is currently WIP, and as of now, it only offers an (extensive) API documentation.
+As of now, the documentation contains a small introduction to get started and and a complete API documentation.
 Please feel free to contact me or just open an issue on Gitlab / Github if you have any suggestions.

 .. toctree::
   :maxdepth: 2
   :caption: Contents:

+   getting_started
   api



--- a/orcasong/data_to_images.py
+++ b/orcasong/data_to_images.py
@@ -148,7 +148,8 @@ def calculate_bin_edges(n_bins, det_geo, fname_geo_limits, do4d):
 def main(n_bins, det_geo, do2d=False, do2d_pdf=(False, 10), do3d=False, do4d=(True, 'time'), prod_ident=None,
         timecut=('trigger_cluster', 'tight_1'), do_mc_hits=False, use_calibrated_file=False, data_cuts=None):
    """
-    Main code. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used for a CNN.
+    Main code with config parameters. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used
+    for a CNN.

    Parameters
    ----------