Skip to content
Snippets Groups Projects
Title: Processing overview
Author: Jutta
status: draft

Data taking

Data processing follows a tier-based approach, where initial filtering for particle interaction-related photon patterns (triggering of photon "hits") serves to create data at a first event-based data level. In a second step, processing of the events, applying calibration, particle reconstruction and data analysis methods leads to enhanced data sets, requiring a high-performance computing infrastructure for flexible application of modern data processing and data mining techniques.

For physics analyses, derivatives of these enriched data sets are generated and their information is reduced to low-volume high-level data which can be analysed and integrated locally into the analysis workflow of the scientist. For interpretability of the data, a full Monte Carlo simulation of the data generation and processing chain, starting at the primary data level, is run to generate reference simulated data for cross-checks at all processing stages and for statistic interpretation of the particle measurements.

Overview over data levels

Event data processing

Photon-related information is written to ROOT-based tree-like data structures and accumulated during a predefined data taking time range of usually several hours (so-called data runs) before being transferred to high-performance computing (HPC) clusters.

Processed event data sets at the second level represent input to physics analyses, e.g. regarding neutrino oscillation and particle properties, and studies of atmospheric and cosmic neutrino generation. Enriching the data to this end involves probabilistic interpretation of temporal and spatial photon distributions for the reconstruction of event properties in both measured and simulated data, and requires high-performance computing capabilities.

Access to data at this level is restricted to collaboration members due to the intense use of computing resources, the large volume and complexity of the data and the members' primary exploitation right of KM3NeT data. However, data at this stage is already converted to HDF5 format as a less customized hierarchical format. This format choice increases interoperability and facilitates the application of data analysis software packages used e.g. in machine learning and helps to pave the way to wider collaborations within the scientific community utilizing KM3NeT data.

High level data and data derivatives

Summary formats and high-level data

As mostly information on particle type, properties and direction is relevant for the majority of physics analyses, a high-level summary format has been designed to reduce the complex event information to simplified arrays which allow for easy representation of an event data set as a table-like data structure. Although this already leads to a reduced data volume, these neutrino data sets are still dominated by atmospheric muon events at a ratio of about 10^{6} :1. Since, for many analyses, atmospheric muons are considered background events to both astrophysics and oscillation studies, publication of low-volume general-purpose neutrino data sets requires further event filtering. Here, the choice of optimal filter criteria is usually dependent on the properties of the expected flux of the signal neutrinos and performed using the simulated event sets.