Skip to content
Snippets Groups Projects
Commit e7e92063 authored by ViaFerrata's avatar ViaFerrata
Browse files

Large Update to OrcaSong, mostly usability improvements.

- Implemented a better parser and implemented config files as an input to OrcaSong.
- Got rid of the ORCA_Geo_115lines.txt file, the detector geometry is now read from a .detx file that needs to be supplied with the input file.
- Improved general readability of the code
- fixed run_id readout
- added option to make no timecut
parent 234a4a98
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
...@@ -18,10 +18,13 @@ OrcaSong: Main Framework ...@@ -18,10 +18,13 @@ OrcaSong: Main Framework
.. autosummary:: .. autosummary::
:toctree: api :toctree: api
parse_input
parser_check_input
calculate_bin_edges calculate_bin_edges
calculate_bin_edges_test calculate_bin_edges_test
skip_event
data_to_images
main main
parse_input
``orcasong.file_to_hits``: Extracting event information ``orcasong.file_to_hits``: Extracting event information
......
...@@ -28,7 +28,7 @@ In order to use :code:`tohdf5`, you need to have loaded a jpp version first. A r ...@@ -28,7 +28,7 @@ In order to use :code:`tohdf5`, you need to have loaded a jpp version first. A r
/sps/km3net/users/mmoser/setenvAA_jpp9_cent_os7.sh /sps/km3net/users/mmoser/setenvAA_jpp9_cent_os7.sh
Additionally, you need to have a python environment on Lyon, where you have installed km3pipe (e.g. use a pyenv).
Then, the usage of :code:`tohdf5` is quite easy:: Then, the usage of :code:`tohdf5` is quite easy::
~$: tohdf5 -o testfile.h5 /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root ~$: tohdf5 -o testfile.h5 /sps/km3net/users/kmcprod/JTE_NEMOWATER/withMX/muon-CC/3-100GeV/JTE.KM3Sim.gseagen.muon-CC.3-100GeV-9.1E7-1bin-3.0gspec.ORCA115_9m_2016.99.root
...@@ -138,41 +138,106 @@ After pulling the OrcaSong repo to your local harddisk you first need to install ...@@ -138,41 +138,106 @@ After pulling the OrcaSong repo to your local harddisk you first need to install
~/orcasong$: pip install . ~/orcasong$: pip install .
Before you can start to use OrcaSong, you need to provide a file that contains the XYZ positions of the DOMs to OrcaSong. Before you can start to use OrcaSong, you need a .detx detector geometry file that corresponds to your input files.
OrcaSong is currently producing event "images" based on a 1 DOM / XYZ-bin assumption. This image generation is done OrcaSong is currently producing event "images" based on a 1 DOM / XYZ-bin assumption. This image generation is done
automatically, based on the number of bins (n_bins) for each dimension XYZ that you supply as an input and based on the automatically, based on the number of bins (n_bins) for each dimension XYZ that you supply as an input and based on the
DOM XYZ position file. An examplary file for the DOM positions can be found in the folder /orcasong of the OrcaSong .detx file which contains the DOM positions.
repo, "ORCA_Geo_115lines.txt". Currently, this file is hardcoded as an input for OrcaSong, so if you want to use another
detector geometry, you should include your .txt file in the main() function in "data_to_images.py". If your .detx file is not contained in the OrcaSong/detx_files folder, please add it to the repository!
You can generate this .txt file by taking the .det (not .detx!) file, e.g.:: Currently, only the 115l ORCA 2016 detx file is available.
/afs/in2p3.fr/throng/km3net/detectors/orca_115strings_av23min20mhorizontal_18OMs_alt9mvertical_v1.det At this point, you're finally ready to use OrcaSong, it can be executed as follows::
Then, you need to look for the :code:`OM_cluster_data` lines:: ~/orcasong$: python data_to_images.py testfile.h5 geofile.detx
OM_cluster_data: 1 -86.171 116.880 196.500 -1.57080 0.00000 1.57080 OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt.
OM_cluster_data: 2 -86.171 116.880 187.100 -1.57080 0.00000 1.57080
...
Here, the first column is the dom_id and the second, third and fourth column is the XYZ position. The configuration options of OrcaSong can be found by calling the help::
You need to copy this information into a .txt file, such that it can be read by OrcaSong. One could automate this such
that OrcaSong looks for the correct lines in the .det file automatically, however, multiple (old) conventions exist for ~/orcasong$: python data_to_images.py -h
the .det file structure, so it may be a bit tedious. Nevertheless, contributions are very welcome! :) Main OrcaSong code which takes raw simulated .h5 files and the corresponding .detx detector file as input in
order to generate 2D/3D/4D histograms ('images') that can be used for CNNs.
The input file can be calibrated or not (e.g. contains pos_xyz of the hits).
Makes only 4D histograms ('images') by default.
Usage:
data_to_images.py [options] FILENAME DETXFILE
data_to_images.py (-h | --help)
Options:
-h --help Show this screen.
-c CONFIGFILE Load all options from a config file (.toml format).
--n_bins N_BINS Number of bins that are used in the image making for each dimension, e.g. (x,y,z,t).
[default: 11,13,18,60]
--det_geo DET_GEO Which detector geometry to use for the binning, e.g. 'Orca_115l_23m_h_9m_v'.
[default: Orca_115l_23m_h_9m_v]
--do2d If 2D histograms, 'images', should be created.
--do2d_plots If 2D pdf plot visualizations of the 2D histograms should be created, cannot be called if do2d=False.
--do2d_plots_n N For how many events the 2D plot visualizations should be made.
OrcaSong will exit after reaching N events. [default: 10]
--do3d If 3D histograms, 'images', should be created.
--dont_do4d If 4D histograms, 'images', should NOT be created.
--do4d_mode MODE What dimension should be used in the 4D histograms as the 4th dim.
Available: 'time', 'channel_id'. [default: time]
--timecut_mode MODE Defines what timecut mode should be used in hits_to_histograms.py.
At the moment, these cuts are only optimized for ORCA 115l neutrino events!
Currently available:
'timeslice_relative': Cuts out the central 30% of the snapshot.
'trigger_cluster': Cuts based on the mean of the triggered hits.
The timespan for this cut can be chosen in --timecut_timespan.
'None': No timecut.
[default: trigger_cluster]
--timecut_timespan TIMESPAN Only used with timecut_mode 'trigger_cluster'.
Defines the timespan of the trigger_cluster cut.
Currently available:
'all': [-350ns, 850ns] -> 20ns / bin (60 bins)
'tight-1': [-250ns, 500ns] -> 12.5ns / bin
'tight-2': [-150ns, 200ns] -> 5.8ns / bin
[default: tight-1]
--do_mc_hits If only the mc_hits (no BG hits!) should be used for the image processing.
--data_cut_triggered If non-triggered hits should be thrown away for the images.
--data_cut_e_low E_LOW Cut events that are lower than the specified threshold value in GeV.
--data_cut_e_high E_HIGH Cut events that are higher than the specified threshold value in GeV.
--data_cut_throw_away FRACTION Throw away a random fraction (percentage) of events. [default: 0.00]
--prod_ident PROD_IDENT Optional int identifier for the used mc production.
This is useful, if you use events from two different mc productions,
e.g. the 1-5GeV & 3-100GeV Orca 2016 MC. The prod_ident int will be saved in
the 'y' dataset of the output file of OrcaSong. [default: 1]
At this point, you're finally ready to use OrcaSong, it can be executed as follows::
python data_to_images.py testfile.h5
OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt. OrcaSong will then generate a hdf5 file with images into the Results folder, e.g. Results/4dTo4d/h5/xyzt.
The configuration options of OrcaSong can be found in the :code:`main()` function. Alternatively, they can also be found in the docs of the :code:`main()` function:
.. currentmodule:: orcasong.data_to_images .. currentmodule:: orcasong.data_to_images
.. autosummary:: .. autosummary::
main main
In the future, these configurations should probably be relocated to a dedicated .config file. Again, contributions and Other than parsing every information to orcasong via the console, you can also load a .toml config file::
thoughts are very welcome!
~/orcasong$: python data_to_images.py -c config.toml testfile.h5 geofile.detx
Please checkout the config.toml file in the main folder of the OrcaSong repo in order to get an idea about
the structure of the config file.
If anything is still unclear after this introduction just tell me in the deep_learning channel on chat.km3net.de or If anything is still unclear after this introduction just tell me in the deep_learning channel on chat.km3net.de or
write me an email at michael.m.moser@fau.de, such that I can improve this guide! write me an email at michael.m.moser@fau.de, such that I can improve this guide!
......
This diff is collapsed.
# filename: config.toml
# A config file for OrcaSong with multiple configurations.
# Outcomment the config that you want to use!
# More info about the .toml format at https://github.com/toml-lang/toml
### All available options with some dummy values
# --n_bins = '11,13,18,60'
# --det_geo = 'Orca_115l_23m_h_9m_v'
# --do2d = false
# --do2d_plots = false
# --do2d_plots_n = 10
# --do3d = false
# --do4d = true
# --do4d_mode = 'time'
# --timecut_mode = 'trigger_cluster'
# --timecut_timespan = 'tight_1'
# --do_mc_hits = false
# --data_cut_triggered = false
# --data_cut_e_low = 3
# --data_cut_e_high = 100
# --data_cut_throw_away = 0.00
# --prod_ident = 1
####----- Collection of configurations for ORCA 115l -----####
###--- 3-100GeV ---###
## XYZ-T
--n_bins = '11,13,18,60'
--det_geo = 'Orca_115l_23m_h_9m_v'
--do4d_mode = 'time'
--timecut_mode = 'trigger_cluster'
--timecut_timespan = 'tight_1'
--prod_ident = 1
## XYZ-C
#--n_bins = '11,13,18,31'
#--det_geo = 'Orca_115l_23m_h_9m_v'
#--do4d_mode = 'channel_id'
#--timecut_mode = 'trigger_cluster'
#--timecut_timespan = 'tight_1'
#--prod_ident = 1
###--- 3-100GeV ---###
###--- 1-5GeV ---###
### only take < 3GeV events, info throw away: elec-CC 0.25, muon-CC 0.25, elec-NC: 0.00
## XYZ-T
#--n_bins = '11,13,18,60'
#--det_geo = 'Orca_115l_23m_h_9m_v'
#--do4d_mode = 'time'
#--timecut_mode = 'trigger_cluster'
#--timecut_timespan = 'tight_1'
#--data_cut_e_high = 3
#--data_cut_throw_away = 0.25 # elec-CC 0.25, muon-CC 0.25, elec-NC: 0.00
#--prod_ident = 2
## XYZ-C
#--n_bins = '11,13,18,31'
#--det_geo = 'Orca_115l_23m_h_9m_v'
#--do4d_mode = 'channel_id'
#--timecut_mode = 'trigger_cluster'
#--timecut_timespan = 'tight_1'
#--data_cut_e_high = 3
#--data_cut_throw_away = 0.25 # elec-CC 0.25, muon-CC 0.25, elec-NC: 0.00
#--prod_ident = 2
###--- 1-5GeV ---###
####----- Collection of configurations for ORCA 115l -----####
This diff is collapsed.
...@@ -45,7 +45,6 @@ def get_time_residual_nu_interaction_mean_triggered_hits(time_interaction, hits_ ...@@ -45,7 +45,6 @@ def get_time_residual_nu_interaction_mean_triggered_hits(time_interaction, hits_
Array with trigger flags that specifies if the hit is triggered or not. Array with trigger flags that specifies if the hit is triggered or not.
""" """
hits_time_triggered = hits_time[triggered == 1] hits_time_triggered = hits_time[triggered == 1]
t_mean_triggered = np.mean(hits_time_triggered, dtype=np.float64) t_mean_triggered = np.mean(hits_time_triggered, dtype=np.float64)
time_residual_vertex = t_mean_triggered - time_interaction time_residual_vertex = t_mean_triggered - time_interaction
...@@ -53,7 +52,7 @@ def get_time_residual_nu_interaction_mean_triggered_hits(time_interaction, hits_ ...@@ -53,7 +52,7 @@ def get_time_residual_nu_interaction_mean_triggered_hits(time_interaction, hits_
return time_residual_vertex return time_residual_vertex
def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, do4d, prod_ident): def get_event_data(event_blob, geo, do_mc_hits, data_cuts, do4d, prod_ident):
""" """
Reads a km3pipe blob which contains the information for one event and returns a hit array and a track array Reads a km3pipe blob which contains the information for one event and returns a hit array and a track array
that contains all relevant information of the event. that contains all relevant information of the event.
...@@ -68,9 +67,6 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, ...@@ -68,9 +67,6 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts,
do_mc_hits : bool do_mc_hits : bool
Tells the function of the hits (mc_hits + BG) or the mc_hits only should be parsed. Tells the function of the hits (mc_hits + BG) or the mc_hits only should be parsed.
In the case of mc_hits, the dom_id needs to be calculated thanks to the jpp output. In the case of mc_hits, the dom_id needs to be calculated thanks to the jpp output.
use_calibrated_file : bool
Specifies if a calibrated file is used as an input for the event_blob.
If False, the hits of the event_blob are calibrated based on the geo parameter.
data_cuts : dict data_cuts : dict
Specifies if cuts should be applied. Specifies if cuts should be applied.
Contains the keys 'triggered' and 'energy_lower/upper_limit' and 'throw_away_prob'. Contains the keys 'triggered' and 'energy_lower/upper_limit' and 'throw_away_prob'.
...@@ -79,6 +75,9 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, ...@@ -79,6 +75,9 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts,
In the case of 'channel_id', this information needs to be included in the event_hits as well. In the case of 'channel_id', this information needs to be included in the event_hits as well.
prod_ident : int prod_ident : int
Optional int that identifies the used production, more documentation in the docs of the main function. Optional int that identifies the used production, more documentation in the docs of the main function.
geo : None/kp.Geometry
If none, the event_blob should already contain the calibrated hit information (e.g. pos_xyz).
Else, a km3pipe Geometry instance that contains the geometry information of the detector.
Returns Returns
------- -------
...@@ -91,11 +90,16 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, ...@@ -91,11 +90,16 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts,
vertex_pos_x, vertex_pos_y, vertex_pos_z, time_residual_vertex, (prod_ident)]. vertex_pos_x, vertex_pos_y, vertex_pos_z, time_residual_vertex, (prod_ident)].
""" """
p = get_primary_track_index(event_blob) p = get_primary_track_index(event_blob) # TODO fix for mupage/random_noise
# parse track, contains event information like mc_energy
if 'Header' in event_blob: # if Header exists in file, take run_id from it. Else take it from RawHeader.
run_id = event_blob['Header'].start_run.run_id.astype('float32')
else:
run_id = event_blob['RawHeader'][0][0].astype('float32')
# parse tracks [event_id, particle_type, energy, isCC, bjorkeny, dir_x/y/z, time]
event_id = event_blob['EventInfo'].event_id[0] event_id = event_blob['EventInfo'].event_id[0]
run_id = event_blob['Header'].start_run.run_id.astype('float32') # todo check if already f32
particle_type = event_blob['McTracks'][p].type particle_type = event_blob['McTracks'][p].type
energy = event_blob['McTracks'][p].energy energy = event_blob['McTracks'][p].energy
is_cc = event_blob['McTracks'][p].is_cc is_cc = event_blob['McTracks'][p].is_cc
...@@ -106,12 +110,9 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, ...@@ -106,12 +110,9 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts,
time_interaction = event_blob['McTracks'][p].time time_interaction = event_blob['McTracks'][p].time
# parse hits [x,y,z,time] # parse hits [x,y,z,time]
if do_mc_hits is True: hits = event_blob['Hits'] if do_mc_hits is False else event_blob['McHits']
hits = event_blob["McHits"]
else:
hits = event_blob["Hits"]
if use_calibrated_file is False: if 'pos_x' not in event_blob['Hits'].dtype.names: # check if blob already calibrated
hits = geo.apply(hits) hits = geo.apply(hits)
if data_cuts['triggered'] is True: if data_cuts['triggered'] is True:
...@@ -127,7 +128,7 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts, ...@@ -127,7 +128,7 @@ def get_event_data(event_blob, geo, do_mc_hits, use_calibrated_file, data_cuts,
# save collected information to arrays event_track and event_hits # save collected information to arrays event_track and event_hits
track = [event_id, particle_type, energy, is_cc, bjorkeny, dir_x, dir_y, dir_z, time_track, run_id, track = [event_id, particle_type, energy, is_cc, bjorkeny, dir_x, dir_y, dir_z, time_track, run_id,
vertex_pos_x, vertex_pos_y, vertex_pos_z, time_residual_vertex] vertex_pos_x, vertex_pos_y, vertex_pos_z, time_residual_vertex]
if prod_ident is not None: track.append(np.full(event_id.shape, prod_ident)) if prod_ident is not None: track.append(prod_ident)
event_track = np.array(track, dtype=np.float64) event_track = np.array(track, dtype=np.float64)
......
...@@ -65,12 +65,16 @@ def get_time_parameters(event_hits, mode=('trigger_cluster', 'all'), t_start_mar ...@@ -65,12 +65,16 @@ def get_time_parameters(event_hits, mode=('trigger_cluster', 'all'), t_start_mar
t_start = t_mean - t_start_margin * t_diff t_start = t_mean - t_start_margin * t_diff
t_end = t_mean + t_end_margin * t_diff t_end = t_mean + t_end_margin * t_diff
elif mode[0] is None:
t_start = np.amin(t)
t_end = np.amax(t)
else: raise ValueError('Time cut modes other than "first_triggered" or "timeslice_relative" are currently not supported.') else: raise ValueError('Time cut modes other than "first_triggered" or "timeslice_relative" are currently not supported.')
return t_start, t_end return t_start, t_end
def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edges, n_bins, all_4d_to_2d_hists, timecut, event_track, do2d_pdf, pdf_2d_plots): def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edges, n_bins, all_4d_to_2d_hists, timecut, event_track, do2d_plots, pdf_2d_plots):
""" """
Computes 2D numpy histogram 'images' from the 4D data and appends the 2D histograms to the all_4d_to_2d_hists list, Computes 2D numpy histogram 'images' from the 4D data and appends the 2D histograms to the all_4d_to_2d_hists list,
[xy, xz, yz, xt, yt, zt]. [xy, xz, yz, xt, yt, zt].
...@@ -89,7 +93,7 @@ def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edge ...@@ -89,7 +93,7 @@ def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edge
Tuple that defines what timecut should be used in hits_to_histograms. Tuple that defines what timecut should be used in hits_to_histograms.
event_track : ndarray(ndim=2) event_track : ndarray(ndim=2)
Contains the relevant mc_track info for the event in order to get a nice title for the pdf histos. Contains the relevant mc_track info for the event in order to get a nice title for the pdf histos.
do2d_pdf : bool do2d_plots : bool
If True, generate 2D matplotlib pdf histograms. If True, generate 2D matplotlib pdf histograms.
pdf_2d_plots : mpl.backends.backend_pdf.PdfPages/None pdf_2d_plots : mpl.backends.backend_pdf.PdfPages/None
Either a mpl PdfPages instance or None. Either a mpl PdfPages instance or None.
...@@ -117,7 +121,7 @@ def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edge ...@@ -117,7 +121,7 @@ def compute_4d_to_2d_histograms(event_hits, x_bin_edges, y_bin_edges, z_bin_edge
np.array(hist_yt[0], dtype=np.uint8), np.array(hist_yt[0], dtype=np.uint8),
np.array(hist_zt[0], dtype=np.uint8))) np.array(hist_zt[0], dtype=np.uint8)))
if do2d_pdf: if do2d_plots:
# Format in classical numpy convention: x along first dim (vertical), y along second dim (horizontal) # Format in classical numpy convention: x along first dim (vertical), y along second dim (horizontal)
# Need to take that into account in convert_2d_numpy_hists_to_pdf_image() # Need to take that into account in convert_2d_numpy_hists_to_pdf_image()
# transpose to get typical cartesian convention: y along first dim (vertical), x along second dim (horizontal) # transpose to get typical cartesian convention: y along first dim (vertical), x along second dim (horizontal)
......
...@@ -2,3 +2,5 @@ numpy ...@@ -2,3 +2,5 @@ numpy
h5py h5py
matplotlib matplotlib
km3pipe km3pipe
docopt
toml
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment