Skip to content
Snippets Groups Projects
Title: The FAIR principles
Author: Jutta
Topics:
  * Policy basics
  * dedication to open science

Publishing FAIR data

The widely-accepted paradigm for open science data publication requires the implementation of the FAIR principles \cite{FAIR} for research data. This involves the definition of descriptive and standardized metadata and application of persistent identifiers to create a transparent and self-descriptive data regime. Interlinking this data to common science platforms and registries to increase findability and the possibility to harvest from the data through commonly implemented interfaces is as mandatory as is the definition of a policy standard including licensing and access rights management. In all these fields, the standards of KM3NeT are currently developing, including the implementation of a data management plan, the installation of a data provenance model including the application of workflow management, and the setting of data quality standards. In this development process, the application of existing standards especially from the astrophysics community, the development of dedicated KM3NeT software solutions and the integration of the KM3NeT efforts developed during the KM3NeT-INFRADEV project\footnote{see \url{https://www.km3net.org/km3net-infradev/}} are integrated into the ESCAPE project\footnote{European Science Cluster of Astronomy & Particle physics ESFRI research Infrastructures, \url{https://projectescape.eu}.}, which forms the main development environments for open data publication in KM3NeT.

Compliance with the FAIR principles

Although the FAIR data principles define a series of criteria that data and metadata should meet in order to enhance their public usage [4], the KM3NeT collaboration is working to ensure that the data for internal usage are also FAIR compliant. In the following sections, the internal compliance with the FAIR principles is detailed, and the strategy for open access data compliance with FAIR is described.

Metadata for findability and accessibility

A database has been implemented which houses data and metadata related to different aspects of the KM3NeT research infrastructure. Amongst others, this database hosts metadata related to the data taking runs, and calibrations, as well as detector identifiers needed to find the existing data. Data storage at high-performance computing clusters is tracked and files are identifiable through a unique numbering system, where filenames contain the detector identifier and the run number. The metadata contain all the necessary information to track the data in each file down to the original raw data from which they were produced. Additionally, the information about the trigger parameters used in each run is also contained in the metadata. Metadata for software contain complete information about the software versions used to produce each data file as well as information about the computing environment.
Metadata are currently stored within the processed file, although future options for external storage of metadata are investigated to comply with high-performance data management systems like Dirac [10] within the EOSC. External metadata storage will also secure the future provenance if outdated data sets are deleted.

Standardization for access, interoperability and reusability

Currently two different frameworks are maintained, documented and developed for official use within the KM3NeT Collaboration that allow to use the data: the KM3pipe framework, which is developed in python language [11]; and the Jpp framework which is a C++ based software design. Complementing file storage in a ROOT-based format, an HDF5 format definition for both low and high-level data is envisaged, so that the data can be accessed by open source libraries without additional dependencies. All KM3NeT processing software is available in portable environments for use in docker [9] or singularity [14] to ensure portability, and partly available under MIT license.
Introduction of semantic metadata according to established conventions by the World Wide Web Consortium and extensions by the IVOA will further enhance the interoperability of the data processing chain and products.

Compliance of open access data with FAIR principles

The properties which make the KM3NeT data compliant with the FAIR principles will be propagated in their transformation to the open-access datasets. During the last few months, contacts have been made with the German Astrophysical Virtual Observatory (GAVO) [12], which is the German contribution to the IVOA. The first conversations between KM3NeT and GAVO members have been focused on the required standards for the publication of datasets corresponding to searches of cosmic neutrinos in the Virtual Observatory. By respecting and developing these standards it is ensured that the provided data will comply with the FAIR principles.