Skip to content
Snippets Groups Projects
Quality.md 7.58 KiB
Title: Quality management
Author: Rodri
Topics:
  - quality checks on event data
  - quality indicators
status: draft

#Introduction The processes involved in the KM3NeT data processing chain can be grouped into a few main categories. Although the ordering of these categories is not strictly hierarchycal from the point of view of data generation and processing, in the context of a general discussion one could safely assume that a hierarchycal nature exists between them. From bottom to top, these categories would be, data acquisition, detector calibration, event reconstrucion, simulations and finally scientific analyses based on the data processed in the previous categories. The quality of the scientific results produced by KM3NeT will be affected by the performance of the different processes involved in the lower levels of the data processing chain. In order to implement a complete and consistent set of data quality control procedures that span the whole data processing chain, it is required to have a complete, unambiguous and documented strategy for data processing at each of the aforementioned process categories. This includes the setting of data quality criteria which should be initiated at the highest level of the data processing chain, and propagated towards the lowest levels. For each of the aforementioned categories there exists a working group within the KM3NeT collaboration. It therefore corresponds to each of these working groups to develop the working group objectives and procedures according to the scientific needs of KM3NeT. Currently such a documented strategy does not exist for any working group. It has therefore not been possible to develop a full strategy for data quality control. Nevertheless, there have been copious software developments devoted to quality control along the different stages of the data processing chain. In the following, a description of quality control tools and procedures is given. This description could be conceived as an incomplete prototype for a data quality plan.

#Data quality control procedures ##Online Monitor During the data acquisition process, the online monitoring software presents real time plots that allow the shifters to promptly identify problems with the data acquisition. It includes an alert system that sends notifications to the shifters if during the data taking, problems appear that require human intervention. The online monitor uses the same data that are stored for offline analyses (this is actually not true, and should be changed). This implies that any anomaly observed during the detector operation can be reproduced offline.

##Detector Operation As explained in XX.YY, the optical data obtained from the detector operation are stored in ROOT files and moved to a high performance storage environment. The offline data quality control procedures start with a first analysis of these files which is performed daily. It mainly focuses on but is not restricted to the summary data stored in the ROOT files. The summary data contain information related to the performance of the data acquisition procedures for each optical module in the detector. As a result of this first analysis, a set of key-value pairs is produced where each key corresponds to a parameter that represents a given dimension of data quality and the value represents the evaluation of this parameter for the livetime of the analysed data. The results are tagged with a unique identifier corresponding to the analysed data set and uploaded to the database. In the present implementation the analysis is performed for each available file where each file corresponds to a data taking run, although this may change in the future as the data volume generated per run will increase with the detector size.

A further analysis of the results stored in the database includes the comparison of the values of the different parameters to some reference values, allowing for a classification of data periods according to their quality. The reference values are typically set according to the accuracy with which the current detector simulations include the different quality parameters. In addition, the evolution of the different quality parameters can be monitored and made available to the full collaboration as reports. Currently this is done every week by the shifters, and the reports are posted on an electronic log book (ELOG).

##Calibration The first step in the data processing chain is to determine the detector calibration parameters using the data obtained from the detector operation. These parametrers include the time offsets of the PMTs as well as their gains and efficiencies, the positions and orientations of the optical modules. The PMT time offsets and the positions of the optical modules are used in later stages of the data processing chain for event reconstruction, as well as by the real time data filter during the detector operation. While the event reconstruction requires an accurate knowledge of these parameters, the algorithms used by the real time data filter depend rather losely on them, and its performance is not dependent on variations occuring within a timescale of the order of months. Nevertheless, it is still necessary to monitor them and correct the values used by the data filter if necessary. The performance of the detector operation also depends on the response of the PMTs, which is partly determined by their gains. These evolve over time, and they can be set to their nominal values through a tuning of the high-voltage applied to each PMT. Monitoring the PMT gains is therefore also necessary to maximise the detector performance. Additionally, the PMT gains and efficiencies are also used offline by the detector simulation. Within the context of data quality assesment, software tools have been developed by KM3NeT that allow to monitor the parameters described above and to compare them to reference values, raising alerts when necessary. The reference values should be determined by the impact of miscalibrations on the scientific goals of KM3NeT and this work has not been adressed. The arrangement of these tools into a workflow requires the ellaboration of an underlying calibration strategy. This has not been done, and the work is therefore on hold.

##Simulations and Event reconstruction Once the calibration constnats have been determined, the data processing chain continues with the event reconstruction, and with the simulation (and reconstruction) of an equivalent set of events where the information in the summary data is used to simulate the data taking conditions. The simulation of particle interactions and propagation is done by dedicated software, while the detector simulation and event reconstruction is done by Jpp. At the end of the process, the resulting data products are two ROOT formatted files that contain the result of reconstructing the real data events and the simulated events respectively. The contents in these files are the main ingredients for the physics analyses based on reconstructed event data, though these analyses are typically done by dedicated software frameworks which require a special formatting of the data. The data format used for physics analyses is called 'aanet' format, which is also based in ROOT. Control mechanisms are thus needed to ensure consistency between the files produced by Jpp, and the aanet formatted files. There exist mechanisms that can be used to verify the consistency between the content in both files. done by verifying that the distributions of multiple parameters related to the events, as obtained from each of these files, are identical. For this, software tools have been developed to generate the necessary distributions from each file.