Skip to content
Snippets Groups Projects

Draft: Resolve "Reading summary slices and extracting the rates of each optical module"

8 unresolved threads

Closes #93 (closed)

Merge request reports

Migrate to GitLab CI/CD from Jenkins
Take advantage of simple, scalable pipelines and CI/CD enabled features. You can view integration results, security scans, tests, code coverage and more directly in merge requests!

Pipeline #25365 passed with warnings

Test coverage 91.08% from 1 job
Approval is optional
Test summary results are loading
Merge blocked: 2 checks failed
Merge request must not be draft.
Merge conflicts must be resolved.

Merge details

  • The source branch is 103 commits behind the target branch.
  • 18 commits and 1 merge commit will be added to master.
  • Source branch will not be deleted.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @rgracia

  • Rodrigo G. Ruiz added 2 commits

    added 2 commits

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Rodrigo G. Ruiz added 2 commits

    added 2 commits

    Compare with previous version

    • I figured out the way how to read the data with uproot4 in the probably most efficient way (this should be close to or even faster than the original ROOT/C++ implementation):

      We need to pass in the interpretation as an expression dictionary, where the branch names are the keys and the values the interpretation. In the example below I only passed in the interpretation for the summary slices (which is living on the "vector<KM3NETDAQ::JDAQSummaryFrame>" branch) but we should also provide the interpretations for the header etc.

      It takes 3.5 seconds to iterate through 100k summaryslices: reading the full data in steps of 1000 summaryslices into memory. This means that 1000 gives you the maximum memory footprint and inside the loop you can do the calculations for 1000 summaryslices at once, which is perfect for vectorised operations.

      Also note that in your current merge request you first collect a huge dictionary and then dump it at the end of the loop into an HDF5 file. This might be a bit too much for a large ROOT file, like a full detector, since all the values will need to be populated into the memory before they actually get dumped to the disc. With this step-wise approach, you can dump the data chunkwise into the HDF5 file, so the memory footprint is under control and constant.

      import uproot
      
      f = uproot.open("/Users/tamasgal/Data/KM3NeT/sea/KM3NeT_00000094_00010347.root")
      
      summaryslices_addr = "KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE"
      branch_name = "vector<KM3NETDAQ::JDAQSummaryFrame>"
      interpretation = uproot.interpretation.jagged.AsJagged(
              uproot.interpretation.numerical.AsDtype(
                  [
                      ("dom_id", ">i4"),
                      ("dq_status", "u4"),
                      ("hrv", "<u4"),
                      ("fifo", "<u4"),
                      ("status3", "u4"),
                      ("status4", "u4"),
                  ] + [(f"ch{c}", "u1") for c in range(31)]
              ), header_bytes=10,
      )
      
      expressions = {"vector<KM3NETDAQ::JDAQSummaryFrame>": interpretation}
      
      for summaryslices in f[summaryslices_addr].iterate(expressions, step_size=10000):
          ss = summaryslices['vector<KM3NETDAQ::JDAQSummaryFrame>']
          print(f"{len(ss)} summaryslices: {ss}")

      gives the following output:

      10000 summaryslices: [[{dom_id: 808447066, dq_status: 218106880, hrv: 128, ... ch29: 48, ch30: 46}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 49, ch30: 44}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 45}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 44}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 45, ch30: 43}]]
      10000 summaryslices: [[{dom_id: 806483362, dq_status: 251661824, hrv: 128, ... ch29: 49, ch30: 45}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 251661824, hrv: 128, ... ch29: 44, ch30: 41}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 47, ch30: 44}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 49, ch30: 45}]]
      10000 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 50, ch30: 44}]]
      7895 summaryslices: [[{dom_id: 806451584, dq_status: 234884352, hrv: 128, ... ch29: 48, ch30: 43}]]
      Edited by Tamas Gal
    • Author Developer

      I have committed a new version of the script that uses this features. Now it's much faster. As a comparison, processing the same file in the same computer takes about 10-20 seconds now, and before it took about 4 minutes.

      By the way, what are the units returned by the get_rate() function in km3io? Please tell me they're Hz...

    • Please register or sign in to reply
  • Rodrigo G. Ruiz added 9 commits

    added 9 commits

    Compare with previous version

  • Rodrigo G. Ruiz added 18 commits

    added 18 commits

    Compare with previous version

  • added 1 commit

    • 1508e51f - redo the reading with the chunks to memory approach.

    Compare with previous version

    • Awesome! Yes that should be Hz, I should add that to the docs 😂

    • Author Developer

      perfect. I will probably modify the script to change the contents and stucture of the output file (I may probably need to add also HRV information for each module). That's why for the moment the function append_to_file looks far from ideal.

      For the input options, I've chosen the same convention as used in Jpp for consistency but I've always thought they're not the most intuitive ones(-f for input file, -a for detector file, -o for the output file). Are these ok, or should we use other ones?

    • Please register or sign in to reply
  • added 1 commit

    Compare with previous version

  • Rodrigo G. Ruiz added 2 commits

    added 2 commits

    Compare with previous version

  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
1 1 """
2 Usage: extract-dom-rates.py -f INPUT_FILE -o OUTPUT_FILE -a DETECTOR_FILE
2 Usage: extract-dom-rates.py -i INPUT_FILE [-o OUTPUT_FILE -d DETECTOR_FILE]
  • I suggest to change this line, otherwise -s will not work (or needs to be listed explicitly)

    Suggested change
    2 Usage: extract-dom-rates.py -i INPUT_FILE [-o OUTPUT_FILE -d DETECTOR_FILE]
    2 Usage: extract-dom-rates.py [options] -i INPUT_FILE
  • Please register or sign in to reply
  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
  • 53 53 for key in arguments:
    54 54 data[key.replace("-", "")] = arguments[key]
    55 55
    56 if (data["output_file"]==None):
    57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5'
    56 58 # Read list of modules from detector file, and map to indices.
    57 detector = km3pipe.hardware.Detector(data["detector_file"])
    59
    60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000)
    61
    62 dom_ids = []
    63
    64 if (data["detector_file"]==None):
    65 print("No detector file supplied, analysing summaryslices to check for available module IDs.")
    66 for s in reader.summaryslices:
  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
  • 70 83 h5.create_dataset("frame_indices", (0,), maxshape=(None,))
    71 84
    72 85 # Read the channel rates from the summary slices, calculate the total module rates, and save them to the output file
    73 reader = km3io.online.SummarysliceReader(data["input_file"], 10000)
    74 86
    75 87 for ss_chunk in reader:
  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
  • 53 53 for key in arguments:
    54 54 data[key.replace("-", "")] = arguments[key]
    55 55
    56 if (data["output_file"]==None):
    • In Python, comparison to None is preferred via is

      Suggested change
      56 if (data["output_file"]==None):
      56 if (data["output_file"] is None):
    • Please register or sign in to reply
  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
  • 53 53 for key in arguments:
    54 54 data[key.replace("-", "")] = arguments[key]
    55 55
    56 if (data["output_file"]==None):
    57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5'
    56 58 # Read list of modules from detector file, and map to indices.
    57 detector = km3pipe.hardware.Detector(data["detector_file"])
    59
    60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000)
    61
    62 dom_ids = []
    63
    64 if (data["detector_file"]==None):
  • Tamas Gal
    Tamas Gal @tgal started a thread on commit 2daf6f5e
  • 53 53 for key in arguments:
    54 54 data[key.replace("-", "")] = arguments[key]
    55 55
    56 if (data["output_file"]==None):
    57 data["output_file"]=Path(arguments['--input_file']).stem+'.rates.hdf5'
    56 58 # Read list of modules from detector file, and map to indices.
    57 detector = km3pipe.hardware.Detector(data["detector_file"])
    59
    60 reader = km3io.online.SummarysliceReader(data["input_file"], 10000)
    Please register or sign in to reply
    Loading