writing options/ideas to add to km3io
@tgal this is just a discussion issue to get your feedback. A small investigation of the writing options (to ASCI files, ROOT and HDF5) lead me to the following:
-
Uproot
has a limited ability to write ROOT files, including TTrees of flat data (non-jagged: single number per event), a variety of histogram types, and TObjString (for metadata). Uproot can only write TTrees whose branches are basic types (integers and floating-point numbers). Source of this info is here for writing TTrees and here for writing histograms.
So if we take the case of reading data from multiple files for high-level analysis:
- if we only want to write data of the best-reconstructed track for each event- for high-level analysis- then we are fine because this data is not jagged. But we will not be able to write a bit hits class that combines all hits trees from all the files (maybe we don't even need that?).
So it seems like we can write events tree data, and best-reconstructed tracks data from multiple files in one big root file, but hits class is a problem ...
-
for
HDF5
files, I guess there are fewer limitations. I tried to find the best package to use in Python to write to HDF5 files and I found this. what do you think? if you usually use another package please let me know :) PS: I tried to look at your hdf5 module in km3pipe but I couldn't tell clearly what you use to write in hdf5 files (sorry for my ignorance here). -
for
ASCII
files format, the problem is always nested data. I am not sure ASCII format is the most adapted for nested data BUT it can be useful to store some high-level data .. I previously tested/implemented the following idea: so far in Offline files, each event has multiple possible tracks, if you want to store one of the features of the tracks (let's say likelihood), one can separate the nesting with hash-tags ##
if we have 2 events, for example, the file content will look like this:
# header: simply to say which data is this, for example, tracks likelihoods
## event id
data
data
data
...
## following event id
data
data
....
I already implemented this and this file is an example of the output ;)
Please note that this idea is at a very alpha stage, and I would like to know what you think about it (because I personally think the csv file looks ugly, but if we have an additional method read_km3io_csv()
to get the data in a jagged array (using awkward
package), then it's ok?)
- finally, I am also aware that there are some cool writing options in
numpy
andpandas
that one can make good use of. But there do not work with nested data (again)
Please let me know, Thank you!