diff --git a/docs/_static/orcasong_logo_small_low_res.png b/docs/_static/orcasong_logo_small_low_res.png new file mode 100644 index 0000000000000000000000000000000000000000..2f32f07a7b1951e7d884a16e67cff226250f44c8 Binary files /dev/null and b/docs/_static/orcasong_logo_small_low_res.png differ diff --git a/docs/_static/style.css b/docs/_static/style.css index 21cffae73300307587d0bdbf6d136f44738ddd93..11e2dffb033211f1d6fca31baec82d1c3d66be1f 100644 --- a/docs/_static/style.css +++ b/docs/_static/style.css @@ -1,3 +1,13 @@ .wy-nav-content { max-width: none; } + + .section #basic-2-flip-flop-synchronizer{ + text-align:justify; +} + + body { + max-width: 1000px !important; + margin: 0; + padding-left: 1em; +} \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 51a337591c8a55188cd83e3889c4771cc97602e2..f08f846028c69e245eff83efe58bc81089bc40f1 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -198,3 +198,5 @@ epub_exclude_files = ['search.html'] # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = True +def setup(app): + app.add_stylesheet('_static/style.css') \ No newline at end of file diff --git a/docs/getting_started.rst b/docs/getting_started.rst new file mode 100644 index 0000000000000000000000000000000000000000..a7df373c920d5bd53abb32c3dfda6c1401057b20 --- /dev/null +++ b/docs/getting_started.rst @@ -0,0 +1,181 @@ +Getting started with OrcaSong +============================= + +.. contents:: :local: + +Introduction +------------ + +On this page, you can find a step by step introduction into the usage of OrcaSong. +The guide starts with some exemplary root simulation files made with jpp and ends with hdf5 event 'images' that can be used for deep neural networks. + +Preprocessing +------------- + +Let's suppose you have some KM3NeT simulation files in the ROOT dataformat, e.g.:: + + /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root + +The file above contains simulated charged-current muon neutrinos from the official 2016 23m ORCA production. +Now, we want to produce neutrino event images based on this data using OrcaSong. + +Conversion from .root to .h5 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +At first, we have to convert the files from the .root dataformat to a more usable one: hdf5. +For this purpose, we can use a tool called :code:`tohdf5` which is contained in the collaboration framework :code:`km3pipe`. +In order to use :code:`tohdf5`, you need to have loaded a jpp version first. A ready to use bash script for doing this can be found at:: + + /sps/km3net/users/mmoser/setenvAA_jpp9_cent_os7.sh + + +Then, the usage of :code:`tohdf5` is quite easy:: + + ~$: tohdf5 -o testfile.h5 /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root + ++ tohdf5: Converting '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root'... + Pipeline and module initialisation took 0.002s (CPU 0.000s). + loading root.... /afs/.in2p3.fr/system/amd64_sl7/usr/local/root/v5.34.23/ + loading aalib... /pbs/throng/km3net/src/Jpp/v9.0.8454//externals/aanet//libaa.so + ++ km3pipe.io.aanet.AanetPump: Reading metadata using 'JPrintMeta' + WARNING ++ km3pipe.io.aanet.MetaParser: Empty metadata + WARNING ++ km3pipe.io.aanet.AanetPump: No metadata found, this means no data provenance! + --------------------------[ Blob 250 ]--------------------------- + --------------------------[ Blob 500 ]--------------------------- + --------------------------[ Blob 750 ]--------------------------- + --------------------------[ Blob 1000 ]--------------------------- + --------------------------[ Blob 1250 ]--------------------------- + --------------------------[ Blob 1500 ]--------------------------- + --------------------------[ Blob 1750 ]--------------------------- + --------------------------[ Blob 2000 ]--------------------------- + --------------------------[ Blob 2250 ]--------------------------- + --------------------------[ Blob 2500 ]--------------------------- + --------------------------[ Blob 2750 ]--------------------------- + --------------------------[ Blob 3000 ]--------------------------- + --------------------------[ Blob 3250 ]--------------------------- + EventFile io / wall time = 6.27259 / 73.9881 (8.47784 % spent on io.) + ================================[ . ]================================ + ++ km3pipe.io.hdf5.HDF5Sink: HDF5 file written to: testfile.h5 + ============================================================ + 3457 cycles drained in 75.842898s (CPU 70.390000s). Memory peak: 177.71 MB + wall mean: 0.021790s medi: 0.019272s min: 0.015304s max: 2.823921s std: 0.049242s + CPU mean: 0.020330s medi: 0.020000s min: 0.010000s max: 1.030000s std: 0.018179s + ++ tohdf5: File '/sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root' was converted. + +There are also some options that can be used with :code:`tohdf5`:: + + ~$: tohdf5 -h + Convert ROOT and EVT files to HDF5. + + Usage: + tohdf5 [options] FILE... + tohdf5 (-h | --help) + tohdf5 --version + + Options: + -h --help Show this screen. + --verbose Print more output. + --debug Print everything. + -n EVENTS Number of events/runs. + -o OUTFILE Output file (only if one file is converted). + -j --jppy (Jpp): Use jppy (not aanet) for Jpp readout. + --ignore-hits Don't read the hits. + -e --expected-rows NROWS Approximate number of events. Providing a + rough estimate for this (100, 1000000, ...) + will greatly improve reading/writing speed + and memory usage. + Strongly recommended if the table/array + size is >= 100 MB. [default: 10000] + -t --conv-times-to-jte Converts all MC times in the file to JTE + +For now though, we will just stick to the standard conversion without any options. + +After this conversion, you can investigate the data structure of the hdf5 file with the command :code:`ptdump`:: + + ptdump -v testfile.h5 + / (RootGroup) 'KM3NeT' + /event_info (Table(3457,), fletcher32, shuffle, zlib(5)) 'EventInfo' + description := { + "weight_w4": Float64Col(shape=(), dflt=0.0, pos=0), + "weight_w3": Float64Col(shape=(), dflt=0.0, pos=1), + "weight_w2": Float64Col(shape=(), dflt=0.0, pos=2), + "weight_w1": Float64Col(shape=(), dflt=0.0, pos=3), + "run_id": Int64Col(shape=(), dflt=0, pos=4), + "timestamp": Int64Col(shape=(), dflt=0, pos=5), + "nanoseconds": Int64Col(shape=(), dflt=0, pos=6), + "mc_time": Float64Col(shape=(), dflt=0.0, pos=7), + "event_id": Int64Col(shape=(), dflt=0, pos=8), + "mc_id": Int64Col(shape=(), dflt=0, pos=9), + "group_id": Int64Col(shape=(), dflt=0, pos=10)} + ... + +Hdf5 files are structured into "folders", in example the folder that is shown above is called "event_info". +The event_info is just a two dimensional numpy recarray with the shape (3457, 11), where for each event +important information is stored, e.g. the event_id or the run_id. + +There is also a folder called "hits", which contains the photon hits of the detector for all events. +If you dig a little bit into the subfolders you can see that a lot of information is contained about these hits, +e.g. the hit time, but there is no XYZ position of the hits. The only information that you have is the dom_id and the +channel_id of a hit. + +Calibrating the .h5 file +~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to fix this, we can run another tool, :code:`calibrate`, that will add the pos_xyz information to the hdf5 datafile:: + + calibrate /sps/km3net/users/mmoser/det_files/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.detx testfile.h5 + +As you can see, you need a .detx geometry file for this "calibration". Typically, you can find the path of this detx +file on the wiki page of the simulation production that you are using. This calibration step is optional, since OrcaSong +can also do it on the fly, using a .detx file. + +At this point, we are now ready to start using OrcaSong for the generation of event images. + + +Usage of OrcaSong +----------------- + +After pulling the OrcaSong repo to your local harddisk you first need to install it with the provided setup.py:: + + ~/orcasong$: pip install . + +Before you can start to use OrcaSong, you need to provide a file that contains the XYZ positions of the DOMs to OrcaSong. +OrcaSong is currently producing event "images" based on a 1 DOM / XYZ-bin assumption. This image generation is done +automatically, based on the number of bins (n_bins) for each dimension XYZ that you supply as an input and based on the +DOM XYZ position file. An examplary file for the DOM positions can be found in the folder /orcasong of the OrcaSong +repo, "ORCA_Geo_115lines.txt". Currently, this file is hardcoded as an input for OrcaSong, so if you want to use another +detector geometry, you should include your .txt file in the main() function in "data_to_images.py". +You can generate this .txt file by taking the .det (not .detx!) file, e.g.:: + + /afs/in2p3.fr/throng/km3net/detectors/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.det + +Then, you need to look for the :code:`OM_cluster_data` lines:: + + OM_cluster_data: 1 -86.171 116.880 196.500 -1.57080 0.00000 1.57080 + OM_cluster_data: 2 -86.171 116.880 187.100 -1.57080 0.00000 1.57080 + ... + +Here, the first column is the dom_id and the second, third and fourth column is the XYZ position. +You need to copy this information into a .txt file, such that it can be read by OrcaSong. One could automate this such +that OrcaSong looks for the correct lines in the .det file automatically, however, multiple (old) conventions exist for +the .det file structure, so it may be a bit tedious. Nevertheless, contributions are very welcome! :) + +At this point, you're finally ready to use OrcaSong, it can be executed as follows:: + + python data_to_images.py testfile.h5 + +OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt. + +The configuration options of OrcaSong can be found in the :code:`main()` function. + +.. currentmodule:: orcasong.data_to_images +.. autosummary:: + main + +In the future, these configurations should probably be relocated to a dedicated .config file. Again, contributions and +thoughts are very welcome! +If anything is still unclear after this introduction just tell me in the deep_learning channel on chat.km3net.de or +write me an email at michael.m.moser@fau.de, such that I can improve this guide! + + + + diff --git a/docs/index.rst b/docs/index.rst index 351d5f025e14285c53f3bfc6a6c5ce7c7b05ec1c..08875024be546f92ae54babfc84baea1ab5956c7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,10 +5,10 @@ Welcome to the documentation of OrcaSong! ========================================= -OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT. -Find more information about KM3NeT on http://www.km3net.org. +| OrcaSong is a part of the Deep Learning efforts for the neutrino telescope KM3NeT. +| Find more information about KM3NeT on http://www.km3net.org. -In this context, OrcaSong is a project that produces KM3NeT event images based on the raw detector data. +In this regard, OrcaSong is a project that produces KM3NeT event images based on the raw detector data. This means that OrcaSong takes a datafile with (neutrino-) events and based on this data, it produces 2D/3D/4D 'images' (histograms). Currently, only simulations with a hdf5 data format are supported as an input. These event 'images' are required for some Deep Learning machine learning algorithms, e.g. Convolutional Neural Networks. @@ -18,13 +18,14 @@ As of now, only ORCA detector simulations are supported, but ARCA geometries can The main code for generating the images is located in orcanet/data_to_images.py. If the simulated hdf5 files are not calibrated yet, you need to specify the directory of a .detx file in 'data_to_images.py'. -This documentation is currently WIP, and as of now, it only offers an (extensive) API documentation. +As of now, the documentation contains a small introduction to get started and and a complete API documentation. Please feel free to contact me or just open an issue on Gitlab / Github if you have any suggestions. .. toctree:: :maxdepth: 2 :caption: Contents: + getting_started api diff --git a/orcasong/data_to_images.py b/orcasong/data_to_images.py index 1fa7cb5b5690bd28c945ce9f536c4917386843c1..7ea83a54e53cd40848c7451e45331f7e834f87f8 100644 --- a/orcasong/data_to_images.py +++ b/orcasong/data_to_images.py @@ -148,7 +148,8 @@ def calculate_bin_edges(n_bins, det_geo, fname_geo_limits, do4d): def main(n_bins, det_geo, do2d=False, do2d_pdf=(False, 10), do3d=False, do4d=(True, 'time'), prod_ident=None, timecut=('trigger_cluster', 'tight_1'), do_mc_hits=False, use_calibrated_file=False, data_cuts=None): """ - Main code. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used for a CNN. + Main code with config parameters. Reads raw .hdf5 files and creates 2D/3D histogram projections that can be used + for a CNN. Parameters ----------