Fine tuning of data types and layout
This is the merge request to discuss the next iteration of tuning data types and layout of the output file.
I added two rows to each rates-matrix which hold the UTC seconds and nanoseconds, i.e. the beginning of the summaryslice time interval is based on the timestamp of the very first summaryslice in the file which is now treated as the run start.
As mentioned today in the OH meeting, the nanoseconds are now always have a constant value since the resolution of the time interval is in seconds. I don't think it makes much sense to put the nanoseconds in the file but they also do not hurt since they will be compressed away anyways. It just looks a bit weird and I also don't think we need subsecond precision for a time interval typically in the order of several dozens or hundreds of nanoseconds
Three questions:
- do we want sub-second resolution in the time interval parameter?
- should we get rid of the
utc_ns
column in the rates matrices?
This is how the output file is generated and looks like right now (accumulating data of 600s time intervals):
~/Dev/pmt-rates-evolution main* 14s
❯ juliap scripts/extract_rates.jl 600 KM3NeT_00000133_00014728.root
Loading libraries...
Opening /Volumes/Ultraspeed/Data/Obelix/KM3NeT_00000133_00014728.root
Retrieving detector description for detector 133 from the database
Extracting summary data with a time interval of 600 seconds
Progress: 100%|████████████████████████████████████████| Time: 0:00:53
Finalising the output file: KM3NeT_00000133_00014728.pmtrates.h5
and the contents
julia> using HDF5, DataFrames
julia> f = h5open("KM3NeT_00000133_00014728.pmtrates.h5");
julia> f["808467545/max_rate"][:] |> DataFrame
18×34 DataFrame
Row │ utc_s utc_ns duty_cycle ch0 ch1 ch2 ⋯
│ Int32 Int32 Float32 Float32 Float32 Float ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 1676246415 900000000 0.955 55987.2 39368.4 3629 ⋯
2 │ 1676247015 900000000 1.0 10160.4 6586.84 754
3 │ 1676247615 900000000 1.0 53034.6 15254.0 3256
4 │ 1676248215 900000000 1.0 17946.1 11953.7 1261
5 │ 1676248815 900000000 1.0 7749.35 6586.84 695 ⋯
6 │ 1676249415 900000000 1.0 136875.0 73406.7 9117
7 │ 1676250015 900000000 1.0 3.16979e5 20549.2 14846
8 │ 1676250615 900000000 1.0 18945.3 14449.5 1444
9 │ 1676251215 900000000 1.0 2.0e6 2.0e6 ⋯
10 │ 1676251815 900000000 1.0 2.0e6 16999.7
11 │ 1676252415 900000000 1.0 15254.0 11020.6 1163
12 │ 1676253015 900000000 1.0 86362.3 13687.5 3532
13 │ 1676253615 900000000 1.0 18438.9 8873.38 1654 ⋯
14 │ 1676254215 900000000 1.0 10439.4 16103.2 1072
15 │ 1676254815 900000000 1.0 40449.4 15254.0 3256
16 │ 1676255415 900000000 1.0 43873.9 42701.4 4387
17 │ 1676256015 900000000 1.0 774935.0 4.38739e5 ⋯
18 │ 1676256615 900000000 0.854833 46316.5 34381.4 3831
29 columns omitted
julia> f["runinfo"][:] |> DataFrame
1×6 DataFrame
Row │ run utc_s utc_ns time_interval idx n_rows
│ Int32 Int32 Int32 Int32 Int64 Int64
─────┼────────────────────────────────────────────────────────────
1 │ 14728 1676246415 900000000 600 1 18
Let me ping @mdejong @vpestel @vkulikovskiy @laphecetche