Draft: Resolve "Reading summary slices and extracting the rates of each optical module"
Closes #93 (closed)
Merge request reports
Activity
assigned to @rgracia
added 2 commits
I figured out the way how to read the data with
uproot4
in the probably most efficient way (this should be close to or even faster than the original ROOT/C++ implementation):We need to pass in the interpretation as an
expression
dictionary, where the branch names are the keys and the values the interpretation. In the example below I only passed in the interpretation for the summary slices (which is living on the"vector<KM3NETDAQ::JDAQSummaryFrame>"
branch) but we should also provide the interpretations for the header etc.It takes 3.5 seconds to iterate through 100k summaryslices: reading the full data in steps of 1000 summaryslices into memory. This means that 1000 gives you the maximum memory footprint and inside the loop you can do the calculations for 1000 summaryslices at once, which is perfect for vectorised operations.
Also note that in your current merge request you first collect a huge dictionary and then dump it at the end of the loop into an HDF5 file. This might be a bit too much for a large ROOT file, like a full detector, since all the values will need to be populated into the memory before they actually get dumped to the disc. With this step-wise approach, you can dump the data chunkwise into the HDF5 file, so the memory footprint is under control and constant.
import uproot f = uproot.open("/Users/tamasgal/Data/KM3NeT/sea/KM3NeT_00000094_00010347.root") summaryslices_addr = "KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE" branch_name = "vector<KM3NETDAQ::JDAQSummaryFrame>" interpretation = uproot.interpretation.jagged.AsJagged( uproot.interpretation.numerical.AsDtype( [ ("dom_id", ">i4"), ("dq_status", "u4"), ("hrv", "<u4"), ("fifo", "<u4"), ("status3", "u4"), ("status4", "u4"), ] + [(f"ch{c}", "u1") for c in range(31)] ), header_bytes=10, ) expressions = {"vector<KM3NETDAQ::JDAQSummaryFrame>": interpretation} for summaryslices in f[summaryslices_addr].iterate(expressions, step_size=10000): ss = summaryslices['vector<KM3NETDAQ::JDAQSummaryFrame>'] print(f"{len(ss)} summaryslices: {ss}")
gives the following output:
10000 summaryslices: [[{dom_id: 808447066, dq_status: 218106880, hrv: 128, ... ch29: 48, ch30: 46}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 49, ch30: 44}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 45}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 44}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 45, ch30: 43}]] 10000 summaryslices: [[{dom_id: 806483362, dq_status: 251661824, hrv: 128, ... ch29: 49, ch30: 45}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 251661824, hrv: 128, ... ch29: 44, ch30: 41}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 47, ch30: 44}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 49, ch30: 45}]] 10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 50, ch30: 44}]] 7895 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 43}]]
Edited by Tamas GalI have committed a new version of the script that uses this features. Now it's much faster. As a comparison, processing the same file in the same computer takes about 10-20 seconds now, and before it took about 4 minutes.
By the way, what are the units returned by the
get_rate()
function inkm3io
? Please tell me they're Hz...
added 9 commits
Toggle commit listadded 18 commits
- 231c97d3 - Update changelog
- 7a457b14 - Add length to SummarysliceReader
- 8ad906d3 - Fix length determination for summaryslices
- 84e4fbbc - Deprecate old Summaryslice reader and introduce the new one
- 589cc961 - Update docs on online stuff
- 3e5ba7b9 - Cleanup and preparation for event reader rewrite
- 07efad81 - Generalise summaryslice reader into a "chunks reader"
- 385b5228 - Fix docs
- e929afa4 - Update plot_online_example.py
- 594a767b - Update plot_online_example.py
- 20819a7d - Add getitem to SummarysliceReader
- ff3bbad6 - add script to extract dom rates
- aaa0513f - fix small bugs
- e7c44dd7 - another small fix
- 7e1cda92 - rename script
- 6230d58c - rename script
- e1b2f654 - blackened
- ec5d2278 - Merge branch...
Toggle commit listadded 1 commit
- 1508e51f - redo the reading with the chunks to memory approach.
perfect. I will probably modify the script to change the contents and stucture of the output file (I may probably need to add also HRV information for each module). That's why for the moment the function
append_to_file
looks far from ideal.For the input options, I've chosen the same convention as used in
Jpp
for consistency but I've always thought they're not the most intuitive ones(-f
for input file,-a
for detector file,-o
for the output file). Are these ok, or should we use other ones?
added 2 commits
1 1 """ 2 Usage: extract-dom-rates.py -f INPUT_FILE -o OUTPUT_FILE -a DETECTOR_FILE 2 Usage: extract-dom-rates.py -i INPUT_FILE [-o OUTPUT_FILE -d DETECTOR_FILE] 53 53 for key in arguments: 54 54 data[key.replace("-", "")] = arguments[key] 55 55 56 if (data["output_file"]==None): 57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5' 56 58 # Read list of modules from detector file, and map to indices. 57 detector = km3pipe.hardware.Detector(data["detector_file"]) 59 60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000) 61 62 dom_ids = [] 63 64 if (data["detector_file"]==None): 65 print("No detector file supplied, analysing summaryslices to check for available module IDs.") 66 for s in reader.summaryslices: 70 83 h5.create_dataset("frame_indices", (0,), maxshape=(None,)) 71 84 72 85 # Read the channel rates from the summary slices, calculate the total module rates, and save them to the output file 73 reader = km3io.online.SummarysliceReader(data["input_file"], 10000) 74 86 75 87 for ss_chunk in reader: 53 53 for key in arguments: 54 54 data[key.replace("-", "")] = arguments[key] 55 55 56 if (data["output_file"]==None): 57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5' 56 58 # Read list of modules from detector file, and map to indices. 57 detector = km3pipe.hardware.Detector(data["detector_file"]) 59 60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000) 61 62 dom_ids = [] 63 64 if (data["detector_file"]==None): 53 53 for key in arguments: 54 54 data[key.replace("-", "")] = arguments[key] 55 55 56 if (data["output_file"]==None): 57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5' 56 58 # Read list of modules from detector file, and map to indices. 57 detector = km3pipe.hardware.Detector(data["detector_file"]) 59 60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000)