Skip to content
Snippets Groups Projects
Commit 30e5f188 authored by Jutta Schnabel's avatar Jutta Schnabel
Browse files

Updating headers

parent ebf63ef0
No related branches found
No related tags found
No related merge requests found
Pipeline #14416 failed
Showing with 313 additions and 8 deletions
---
Title: Courses and Webinars
Author: Dimitris
Topics:
......
---
Title: Open data formats
Author: Jutta
status: dump
---
# Open data formats
## Neutrino sets in the VO
Tabulated high-level neutrino event data can be provided through the VO
registry, utilizing access protocols like the Table Access Protocol
(TAP) and query languages like the Astronomical Data Query Language
(ADQL). To query these data sets related to astronomical sources, the
Simple Cone Search (SCS) protocol allows to pick specific events
according to particle source direction, using Unified Content
Descriptors (UCDs) to identify the relevant table columns. The
underlying data format is the VOTable which allows for metadata
annotation of data columns. As the DaCHS software provides input
capabilities for various formats like FITS\footnote{Flexible Image
Transport System, \url{https://fits.gsfc.nasa.gov/}.} or text-based
tables on the server side, a common KM3NeT open event table format
can be chosen quite independently and the interface adapted such that high-level neutrino data
sets can be both offered through the VO and alternative access
protocols, as long as the required metadata description is handled
adequately.
VO standards are at the current stage not fully adapted
to the inclusion of neutrino data and require development of metadata
standards for easy interpretability of the data, a matter which is targeted within the ESCAPE project. Open questions in this regard are the linkage of observation probabilities to a given event selection, the inclusion of ``non-observation'' in a given field of view
and within a given time as relevant scientific information to retrieve
from the services, and the introduction of a dedicated vocabulary for the
description of neutrino data. This vocabulary will need to be developed within KM3NeT as a matter of internal standardization, however, the process will draw guidance from the VO expertise and framework.
## Multimessenger alerts
Single or stacked neutrino events of high astrophysical signal probability will be selected in KM3NeT to trigger an alert to other observatories indicating a
possible target for multimessenger observations \cite{MM}. The
VOEvent format, together with the VOEvent Transport
Protocol as implemented in the Comet
software\footnote{J. Swinbank, Comet, \url{https://comet.transientskp.org}.} will be used to distribute these events as outgoing alerts. As the format is specifically tailored to the use
in multimessenger alerts, indicating a quite restricted scientific target, the providing of context information for the events can be specifically adapted to this use case. However, harmonization of metadata standards like parameter descriptors and event identifiers in reference to the full neutrino event sets will also have to be implemented.
## Providing simulation-driven services
Providing context information on a broader scale in the form of e.g. sensitivity services and instrument response functions alongside the VO-published data sets is still under investigation. On the one hand, VO access protocols like TAP facilitate the use of standardized queries on services. On the other hand, integrating those services with the data sets in a meaningful and user-transparent way, e.g. through VO Datalinks, still requires further deliberation. Therefore, the development of these services will be use-case driven and also include the application of similar services for studies in other fields of KM3NeT research beyond astrophysics.
---
Title: Metadata generation and datamodels
Author: Jutta
Topics:
* data models
* configurations for software
---
* Started collection of metadata definitions (from publication level), based on standards of e.g. W3C, IVOA, DataCite etc.
* Basic class for open science publications: km3resource
* To be extended to earlier processing stages to standardize data processing and handling
* Basic identifying data classes: ktype (pointing to a class definition) and kid (unique identifier of objects in KM3NeT)
---
Title: KM3NeT server REST API
Author: Jutta
---
---
Title: Detector and Data Taking
Author: Jannik
(Mainly based on LoI)
---
**Detector**
## Detector
The KM3NeT Research Infrastructure will consist of a network of deep-sea neutrino detectors in the Mediterranean Sea with user ports for Earth and Sea sciences.
......@@ -17,7 +19,7 @@ The ARCA (Astroparticle Research with Cosmics in the Abyss) detector is being in
Technical details on the detector design are given in [1].
**Data Acquisition**
## Data Acquisition
The readout of the KM3NeT detector is based on the 'all-data-to-shore' concept, in which all analogue signals from the PMTs that pass a reference threshold are digitised. This data contain the time at which the analogue pulse crosses the threshold level, the time that the pulse remains above the threshold level (known as time-over-threshold, or ToT), and the PMT address. This is typically called a *hit*. All digital data (about 25 Gb/s per building block) are sent to a computing farm onshore where they are processed in real time.
The recorded data is dominated by optical background noise from Cherenkov light from K40 decays in the seawater as well as bioluminescence from luminescent organisms in the deep sea. Events of scientific interest are filtered from the background using designated software, which exploit the time-position correlations following from causality. To maintain all available information for the offline analyses, each event contains a snapshot of all the data in the detector during the event.
......@@ -29,7 +31,7 @@ In parallel to the optical data, acoustic data and instrument data are recorded.
During operation the continuous data stream sent by the detector is split into small time intervals, called *runs*, with typical durations of a few hours. This is done for practical reasons of the data acquisition. In addition, this procedure allows to selected a set of run periods with high-quality data based on the monitored detector status, environmental conditions and data quality. The calibration for timing, positioning and photon detection efficiency is done offline using the calibration data.
**Simulations**
## Simulations
To assess the detector efficiency and systematics, dedicated Monte Carlo simulations are processed. Due to the changing data-taking conditions of the detector in the deep-sea environment, time-dependent simulation data-sets are required. These are implemented in a run-by-run simulation strategy, where runs are sufficiently small time intervals of data taking with stable conditions. The detector response is simulated individually for these periods. The simulation data are generated at the raw-data level and are subjected to the same filter and reconstruction processing as the real data. Since large statistics are required for precise analyses, the simulation data will significantly exceed the real data in volume.
......
---
Title: Docker
Author: Tamas
Title:
......
---
Title: ESAP & ESCAPE
Author: Jutta
---
---
Title: The FAIR principles
Author: Jutta
Topics:
......@@ -5,3 +6,21 @@ Topics:
* dedication to open science
---
## Publishing FAIR data
The widely-accepted paradigm for open science data publication requires the implementation of the FAIR principles \cite{FAIR} for research data. This involves the definition of descriptive and standardized metadata and application of persistent identifiers to create a transparent and self-descriptive data regime. Interlinking this data to common science platforms and registries to increase findability and the possibility to harvest from the data through commonly implemented interfaces is as mandatory as is the definition of a policy standard including licensing and access rights management. In all these fields, the standards of KM3NeT are currently developing, including the implementation of a data management plan, the installation of a data provenance model including the application of workflow management, and the setting of data quality standards. In this development process, the application of existing standards especially from the astrophysics community, the development of dedicated KM3NeT software solutions and the integration of the KM3NeT efforts developed during the KM3NeT-INFRADEV project\footnote{see \url{https://www.km3net.org/km3net-infradev/}} are integrated into the ESCAPE project\footnote{European Science Cluster of Astronomy \& Particle physics ESFRI research Infrastructures, \url{https://projectescape.eu}.}, which forms the main development environments for open data publication in KM3NeT.
## Compliance with the FAIR principles
Although the FAIR data principles define a series of criteria that data and metadata should meet in order to enhance their public usage [4], the KM3NeT collaboration is working to ensure that the data for internal usage are also FAIR compliant. In the following sections, the internal compliance with the FAIR principles is detailed, and the strategy for open access data compliance with FAIR is described.
### Metadata for findability and accessibility
A database has been implemented which houses data and metadata related to different aspects of the KM3NeT research infrastructure. Amongst others, this database hosts metadata related to the data taking runs, and calibrations, as well as detector identifiers needed to find the existing data. Data storage at high-performance computing clusters is tracked and files are identifiable through a unique numbering system, where filenames contain the detector identifier and the run number. The metadata contain all the necessary information to track the data in each file down to the original raw data from which they were produced. Additionally, the information about the trigger parameters used in each run is also contained in the metadata. Metadata for software contain complete information about the software versions used to produce each data file as well as information about the computing environment.
Metadata are currently stored within the processed file, although future options for external storage of metadata are investigated to comply with high-performance data management systems like Dirac [10] within the EOSC. External metadata storage will also secure the future provenance if outdated data sets are deleted.
### Standardization for access, interoperability and reusability
Currently two different frameworks are maintained, documented and developed for official use within the KM3NeT Collaboration that allow to use the data: the KM3pipe framework, which is developed in python language [11]; and the Jpp framework which is a C++ based software design. Complementing file storage in a ROOT-based format, an HDF5 format definition for both low and high-level data is envisaged, so that the data can be accessed by open source libraries without additional dependencies. All KM3NeT processing software is available in portable environments for use in docker [9] or singularity [14] to ensure portability, and partly available under MIT license.
Introduction of semantic metadata according to established conventions by the World Wide Web Consortium and extensions by the IVOA will further enhance the interoperability of the data processing chain and products.
### Compliance of open access data with FAIR principles
The properties which make the KM3NeT data compliant with the FAIR principles will be propagated in their transformation to the open-access datasets. During the last few months, contacts have been made with the German Astrophysical Virtual Observatory (GAVO) [12], which is the German contribution to the IVOA. The first conversations between KM3NeT and GAVO members have been focused on the required standards for the publication of datasets corresponding to searches of cosmic neutrinos in the Virtual Observatory. By respecting and developing these standards it is ensured that the provided data will comply with the FAIR principles.
---
Title: Gitlab und Github
Author: Tamas
Title:
......
---
Title: Architecture overview
Author: Jutta
Topics:
* servers, repositories, webpages
---
---
Title: KM3NeT server
Author: Jutta
---
* For all data not publishable through the IVOA, serving as interface and/or server to the data
* Based on Django REST API
* Usable for event data sets (hdf5-files with standardized metadata), plots or services, environmental data ...
* Data accessible through webpage, through REST-API or python based package (openkm3)
* Similar structure to e.g. Gravitational Wave Open Science Center
---
Title: MM alerts
Author: Feifei
Topics:
......
Title: python interface
Author: Tamas, Jutta
---
Title: Open Science Portal
Author: Jutta
Topics:
* installation
* interface through software
---
* Git project serving webpage (Gitbook)
* Knowledge base for open data
* Pointing to all open science products
* Including introduction to KM3NeT data and linking to manuals and projects (how to handle the data, tutorials and manuals for KM3NeT members)
* Should be central access point for external users
---
Title: Publication procedures
Author: Kay, Jutts
Topics:
* data/software releases
* publication procedures
---
## Establishing the Open Science Committee
## Software publication
### Software quality standards
Core requirements (Code)
Storage: git / CI/CD -> SFTP
Installation: Containers
Documentation: wiki -> link to code documentation in git (doxygen or similar: reference guide/API, Getting started & Concepts)
Change procedure: git workflow
Coding standards:
C++ Style Guide / ROOT,
Python Style Guide PEP-8
Java Style Guide
Julia? Fortran?
Recommendations
Tutorial: Getting started / Guides / Concepts / API
Examples
Installation guide
### Publication procedures
#### Roles - who is what?
Author: substantial contribution / idea
Maintainer: responding on issues
Copyright holders: KM3NeT (author list, DOI)
Contributors
WG Coordinator
Referees: nominated by PC and/or OSC
Last step -> actor should be maintainer, author, contributor
Add collaborative feedback as new step
Relating to
OSC: software & data “experts”
PC
### Stages
#### Definition of roles, software candidates and (specific) standards
#### Internal development to meet standards (after pre-review)
#### Reviewing process
#### Publication
Certified by KM3NeT:
License of KM3NeT
Software repository choice
#### Maintenance
### Implementation
Setting up procedures and standards (Who/what?)
* Update of the note, circulation through collaboration
* Proposal on OSC to IB (from ECAP)
* Transition procedures
Current candidates
* Without full reviewing process
* Interim referees:
* Unused: either two referees (extensive)
* Published results: one referee (shortened)
* Improve to meet standards
Software published, but not reviewed
(Software used, but not published) -> encourage publication
## Data publication
### Data quality standards
### Publication procedures
---
Title: Processing overview
Author: Jutta
---
Data processing therefore follows a tier-based approach \cite{km3comp}, where initial
filtering for particle interaction-related photon patterns (triggering of photon
``hits'') serves to create data at a first event-based data level.
In a second step, processing of the events, applying calibration,
particle reconstruction and data analysis methods leads to enhanced data sets,
requiring a high-performance computing infrastructure for flexible
application of modern data processing and data mining techniques.
For physics analyses, derivatives of these enriched data sets are
generated and their information is reduced to low-volume high-level data which can be
analysed and integrated locally into the analysis workflow of the
scientist, see Figure \ref{levels}. For interpretability of the data, a full Monte Carlo
simulation of the data generation and processing chain, starting at the
primary data level, is run to generate reference simulated data for
cross-checks at all processing stages and for statistic
interpretation of the particle measurements.
\begin{figure}
\includegraphics[width=\textwidth]{figs/Data_levels.pdf}
\caption{KM3NeT data levels related to open data publication, including data format description, user access rights and open data publication layer.} \label{levels}
\end{figure}
\subsection{Event-based data generation}
Data processing at the DAQ level follows paradigms of particle physics
and utilizes computing and software methodological approaches of this community. At
the shore stations, event triggering in the Data Acquisition (DAQ)
system leads to a significant reduction of the data stream. The data stream also includes relevant
instrumentation readouts for a comprehensive understanding of data taking
conditions. Photon-related information is written to
ROOT-based \cite{root} tree-like data structures and
accumulated during a predefined data taking time range of usually several
hours (so-called data runs) before being transferred to high-performance
computing (HPC) clusters. Instrumentation and environmental data
collected at the detector site are stored separately in a central database. Acoustic and other environmental data serve as basis for Earth and Sea-science initiatives. Access to this information following an Open Science
approach is under development, however, it will not be covered in the
scope of this report.
Both the complex process of neutrino detection in a natural environment and the low expected
count rate of the cosmic neutrino signal in comparison to atmospheric background events
necessitates the full modelling
of particle generation, detector response and data processing. To this
end, a dedicated simulation chain, starting form cosmic air-shower
particle generation or astrophysical neutrino flux assumptions,
replicates the complete data-processing pipeline. At the event generation level, photon
distributions induced by these particles within the detection volume are
generated, masked by a simulation of the detector response and treated to
the same processing as measurements starting from the second data
level of the offline event format.
\subsection{Event data processing}
Processed event data sets at the second level represent input to physics analyses, e.g.~regarding neutrino oscillation and particle properties, and studies of
atmospheric and cosmic neutrino generation. Enriching the data to this
end involves probabilistic interpretation of temporal and spatial photon distributions for the
reconstruction of event properties in both measured and simulated
data, and requires high-performance computing capabilities. Due to the
distributed infrastructure of the KM3NeT building blocks and the
contribution of computing resources from various partners, data
processing will, in the final detector configuration, necessitate a
federated computing approach, the implementation of which is prepared through containerization
of the required software and testing of distributed resource management
approaches. In this context, the use of a middleware like e.g.~ DIRAC\footnote{Distributed Infrastructure with Remote Agent Control Interware, \url{http://diracgrid.org/}} is explored, again linking
closely to the particle physics community.
Access to data at this level is restricted to collaboration members due
to the intense use of computing resources, the large volume and complexity of the data and
the members' primary exploitation right of KM3NeT data. However, data at this stage is already converted to
HDF5\footnote{The HDF5 file format, \url{https://www.hdfgroup.org/}} format as a less customized hierarchical format. This format choice increases interoperability and facilitates the application of data analysis software packages used e.g.~in machine learning and helps to pave the way to wider
collaborations within the scientific community utilizing KM3NeT data.
\subsection{High level data and data derivatives}
\subsubsection{Summary formats and high-level data}
As mostly information on particle type, properties and direction is relevant for
the majority of physics analyses, a high-level summary format has been designed to
reduce the complex event information to simplified arrays
which allow for easy representation of an event data set as a table-like data structure.
Although this already leads to a reduced data volume, these neutrino
data sets are still dominated by atmospheric muon events at a ratio of about
$10^{6}:1$. Since for many analyses, atmospheric muons are considered background events to
both astrophysics and oscillation studies, publication of low-volume
general-purpose neutrino data sets requires further event filtering. Here, the choice of optimal filter criteria is usually dependent on the properties of the
expected flux of the signal neutrinos and performed using the simulated event sets.
\subsubsection{Event simulation derivatives as service}
To correctly judge the statistical significance of a measured neutrino event rate,
the full high-level simulation data sets are used in KM3NeT internal
studies to ensure a high accuracy of the count rate estimate. As handling
these large data sets is impractical for inter-experimental studies, but
the information is crucial for the interpretability of the data,
parameterized distributions of relevant observables need to be derived
from the simulation data sets and offered as services. Even in absence
of significant neutrino measurements in the construction phase of KM3NeT,
offering sensitivity estimates as in \cite{sensitivity} for given
models is beneficial for the development of common research goals and the development of a corresponding open service is currently under investigation.
---
Title: Python interface
Authors: Tamas, Jutta
Topics:
* installation
* interface through software
---
## Python interface to KM3NeT data
### openkm3
* Small python package to directly use open data in python from local computer
* Interfaces with data center API, allows to query datasets (km3resources)
* Provides functions to download & interpret data products
# Current capability
* Loading hdf-File as pandas DataFrame, reading additional parameter infos & provenance
* Reading histogram data as plain table, pandas Dataframe or automatically build plot
---
Title: Quality management
Author: Rodri
Topics:
......
---
Title: VO repository
Author: Jutta
---
---
Title: Integrating to Zenodo
Author: Jutta
---
* For findability data needs to be citable and registered within a large repository (e.g. DataCite)
* Zenodo as platform widely used in the community, assigning persistent identifiers (DOI) to datasets, images, publications & software
* Software can be integrated directly from GitHub
* KM3NeT “Community” created to connect different KM3NeT contributions
* Can register smaller data samples (planned for KM3NeT example data) as well as public plots, posters etc.
---
Title: Scientific targets
Author: Jannik
---
**Scientific Targets**
# Scientific Targets
The KM3NeT neutrino detectors will continuously register neutrinos from the whole sky. The neutrinos of astrophysical interest, i.e. those from extra-terrestrial origin, need to be identified in the background of atmospheric neutrinos, i.e. those created in Earth’s atmosphere by interactions of cosmic-ray particles. Access to cosmic neutrino data is of high importance for a wide astrophysics community beyond the KM3NeT Collaboration to relate cosmic neutrino fluxes to observations by other neutrino observatories or using other messengers [REFERENZ to Multimessenger], and to compare them with theoretical predictions. The atmospheric neutrinos carry information on the particle physics processes in which they are created, and – in particular those registered with KM3NeT/ORCA – on the neutrinos themselves. These data are relevant for a wide astroparticle and particle physics community. Finally, KM3NeT will monitor marine parameters, such as bioluminescence, currents, water properties and transient acoustic signals and provides user ports for Earth and Sea sciences.
(Taken from Grant Agreement)
**Astro Physics**
## Astro Physics
The main science objective of KM3NeT/ARCA is the detection of high-energy neutrinos of cosmic origin. Neutrinos represent an alternative to photons and cosmic rays to explore the high-energy Universe. Neutrinos can emerge from dense objects and travel large distances, without being deflected by magnetic fields or interacting with radiation and matter. Thus, even modest numbers of detected neutrinos can be of utmost scientific relevance, by indicating the astrophysical objects in which cosmic rays are accelerated, or pointing to places where dark matter particles annihilate or decay.
......@@ -21,7 +23,7 @@ The ARCA detector allows to reconstruct the arrival direction of TeV-PeV neutrin
Further details on the detector performance can be found in [1].
**Neutrino Physics**
## Neutrino Physics
Neutrinos have the peculiar feature that they can change from one flavour to another when propagating over macroscopic distances. This phenomenon of neutrino flavour change is known as 'neutrino oscillation'. The Nobel Prize in Physics of the year 2015 was awarded to T. Kajita and A. B. McDonald for the discovery of neutrino oscillations, which shows that neutrinos have mass [1]. One open question is the so-called 'neutrino mass ordering'. It refers to the sign of one of the two independent neutrino mass differences, the absolute value of which has already been known for more than two decades.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment