diff --git a/portal/content/articles/data.md b/portal/content/articles/data.md new file mode 100644 index 0000000000000000000000000000000000000000..916d32d8d1b40c29b32d127ef5713b9950cc59ef --- /dev/null +++ b/portal/content/articles/data.md @@ -0,0 +1,190 @@ ++++ +date = "2020-10-27T14:53:59Z" +title = "Data" +type = "article" +draft = true ++++ + +--- +* Processing +* Data formats +--- + +# Our Data +## Data taking + +Data processing follows a tier-based approach, where initial filtering for particle interaction-related photon patterns (triggering of photon "hits") serves to create data at a first event-based data level. +In a second step, processing of the events, applying calibration, particle reconstruction and data analysis methods leads to enhanced data sets, +requiring a high-performance computing infrastructure for flexible application of modern data processing and data mining techniques. + +For physics analyses, derivatives of these enriched data sets are generated and their information is reduced to low-volume high-level data which can be analysed and integrated locally into the analysis workflow of the +scientist. For interpretability of the data, a full Monte Carlo simulation of the data generation and processing chain, starting at the +primary data level, is run to generate reference simulated data for cross-checks at all processing stages and for statistic interpretation of the particle measurements. + + + +### Event data processing + +Photon-related information is written to ROOT-based tree-like data structures and accumulated during a predefined data taking time range of usually several hours (so-called data runs) before being transferred to high-performance computing (HPC) clusters. + +Processed event data sets at the second level represent input to physics analyses, e.g. regarding neutrino oscillation and particle properties, and studies of atmospheric and cosmic neutrino generation. Enriching the data to this end involves probabilistic interpretation of temporal and spatial photon distributions for the reconstruction of event properties in both measured and simulated data, and requires high-performance computing capabilities. + +Access to data at this level is restricted to collaboration members due to the intense use of computing resources, the large volume and complexity of the data and the members' primary exploitation right of KM3NeT data. However, data at this stage is already converted to HDF5 format as a less customized hierarchical format. This format choice increases interoperability and facilitates the application of data analysis software packages used e.g. in machine learning and helps to pave the way to wider collaborations within the scientific community utilizing KM3NeT data. + +### High level data and data derivatives + +#### Summary formats and high-level data + +As mostly information on particle type, properties and direction is relevant for +the majority of physics analyses, a high-level summary format has been designed to +reduce the complex event information to simplified arrays which allow for easy representation of an event data set as a table-like data structure. +Although this already leads to a reduced data volume, these neutrino data sets are still dominated by atmospheric muon events at a ratio of about $10^{6} :1$. Since, for many analyses, atmospheric muons are considered background events to both astrophysics and oscillation studies, publication of low-volume general-purpose neutrino data sets requires further event filtering. Here, the choice of optimal filter criteria is usually dependent on the properties of the expected flux of the signal neutrinos and performed using the simulated event sets. + + +## Open data sets and formats + +As all of the following data is published, inter alia, via the Open Data Center, the data sets are all enriched with metadata following the [KM3OpenResource description](Datamodels.md#resource-description). + +### Particle event tables + +#### Data generation +For particle event publication, the full information from data level 2 file reconstructed event is reduced to a "one row per event" format by selecting the relevant parameters from the level 2 files. The event and parameters selection, metadata annotation and conversion of parameters to the intended output format is performed using the *km3pipe* software. The prototype provenance recording has also been included in this software, so that the output of the pipeline includes already the relevant metadata as well as provenance information. The software allows writing of the data to several formats, including text-based formats and hdf5, which are the two relevant formats used in this demonstator. + +#### Data description + +**Scientific use** + +Particle event samples can be used in both astrophyics analysis as well as neutrino oscillation studies, see the [KM3NeT science targets](ScienceTargets.md). Therefore, the data must be made available in a format suitable for the Virtual Observatory as well as for particle physics studies. + +**Metadata** + +The events, from which relevant *parameters* like particle direction, time, energy and classification parameters are selected for generation of the event table, is enriched with the following metadata. + +| Metadata type | content | +| ------------- | ------- | +| *Provenance* information | list of processing steps (referenced by identifier) | +| *Parameter* description | parameter name, unit (SI), type, description, identifier | +| *Data taking* metadata | start/stoptime, detector, event selection info | +| *Publication* metadata | publisher, owner, creation date, version, description | + +#### Technical specification + +##### Data structure +The general data structure is an event list which can be displayed as a flat table with parameters for one event filling one row. Each event row contains an [event identifier](Datamodels.md#particle-event-identifiers). + +##### File format +For the tabled event data, various output formats are used depending on the platform used for publication and the requirements for interoperability. The formats defined at the moment here are not exclusive and might be extended according to specific requests from the research community in the future. + +For hdf5 files as output, various options exist to store metadata, as several tables can be written to the same file and each table and the file itself can hold additional information as attributes to the file or table. Therefore, metadata that should be easy for the user to find and read have been stored to a separate "header" table while metadata that is more relevant for the machine-based interpretation of the data has been stored as attributes. + +In the case of a text-based table, csv files are generated that are accompanied by a metadata file. + +| output format | provenance | parameters | data taking | publication | +| ------------- | ---------- | ---------- | ----------- | ----------- | +| hdf5 | file header | table header | table header | "header" table | +| csv table | metadata file | metadata file | metadata file | metadata file | + +##### Interfaces + +**VO server** +If the neutrino set is relevant for astrophysics analyses, a text file is generated and the metadata mapped to the [resource description format](https://dachs-doc.readthedocs.io/tutorial.html#the-resource-descriptor) required by the DaCHs software, with the [simple cone search (SCS)](https://ivoa.net/documents/cover/ConeSearch-20080222.html) protocol applied to it. In the ODC, the event sample is recorded as KM3OpenResource pointing to the service endpoints of the VO server. Thus, the data set is findable both through the VO registry and the ODC and accessible through VO-offered access protocols. + +**KM3NeT Open Data Server** +In the current test setup, event files that are not easily interpretable in an astrophysics context like the test sample from the ORCA detector, containing mostly atmospheric muons, are stored on the server, and registered as KM3OpenResource. While this practice is acceptable now for the relatively small datasets, the design of the server also allows in the future to point to external data sources and interface with storage locations of extended data samples. + + +### Multimessenger alerts +#### Data generation +Data generation and scientific use have been described in [the Multimessenger section](Multimessenger.md). The output of the online reconstruction chain is an array of parameters for the identified event as json *key: value* dictionary, which then is annotated with the relevant metadata to match the [VOEvent specifications](https://ivoa.net/documents/VOEvent/20110711/index.html). + +#### Data description + +The event information can, depending on its specific use, be divided into the following data or metadata categories. + +| (Meta)data type | content | +| ------------- | ------- | +| Event identification | event identifier, detector | +| Event description | type of triggers, IsRealAlert | +| Event coordinates | time, rightascension, declination, longitude, latitude | +| Event properties | flavor, multiplicity, energy, neutrino type, error box 50%, 90% (TOC), reconstruction quality, probability to be neutrino, probability for astrophysical origin, ranking | +| Publication metadata | publisher, contact | + +#### Technical specification + +##### Data structure & format +The VOEvent is stored as XML file which contains central sections of *WhereWhen, Who, What, How* and *Why*. + +##### VO Event specifications + +| Section | Description | (Meta)data | +| ------- | ----------- | ---------- | +| `<Who>` | Publication metadata | including VOEvent stream identifier | +| `<WhereWhen>` | Space-time coordinates | event coordinates offered in UTC (time) and FK5 (equatorial coordinates) and detector location | +| `<What>` | Additional parameters | event properties, event identifier | +| `<How>` | Additional information | description of the alert type | +| `<Why>` | Scientific context | details on the alert procedure | + + +##### Interfaces +The Alert receiving/sending is via the [GCN](https://gcn.gsfc.nasa.gov/). The Alert data will be the neutrino candidates in VOEvent format, which is the standard data format for experiments to report and communicate their observed transient celestial events facilitating for follow-ups. The alert distribution is done via [Comet](https://comet.transientskp.org/en/stable/index.html) which is an implementation of the VOEvent transportation protocol. + +Beyond this, there are also others receivers that can be implemented but are less convenient, e.g. the TNS for the optical alerts, the ZTF/LSST broker for the optical transients, the Fermi flare’s advocate for the Fermi blazar outbursts. + +For the public alerts, KM3NeT will also submit the notice and circular (human in the loop) for the dissemination. + + +### Supplementary services and data derivatives +#### Data generation +Providing context information on a broader scale in the form of e.g. sensitivity services and instrument response functions alongside the VO-published data sets is still under investigation and highly dependent on the specific information. Therefore, additional metadata for the interpretation of the format is required. + +#### Data description + +**Scientific use** + +Models and theoretical background information used in the analysis are provided, e.g. accompagning data sets (as for the ANTARES example dataset), to statistically interpret the data sets. Alternatively, probability functions for theoretical predictions and drawn from simulations are considered for publication, including e.g. instrument response functions. + +**Metadata** + +Metadata here must be case specific: +* Description of the *structure of the data* (e.g. binned data, formula), which will be indicated by a content descriptor [ktype](Datamodels.md#ktype) and accompanied by type-specific additional metadata +* Description of the *basic data set* from which the information is derived, its scope in time and relevant restraints to the basic domain, e.g. description of the simulation sample +* Description of all relevant *parameters* + +#### Technical specification +##### Data structure & format +The data is provided as csv table or json with the relevant metadata provided alongside the data in a separate text file or in a header section. + +##### Interfaces +Interprestation of the plot or service data is provided using the *openkm3* package, which loads the data as KM3OpenResource from the ODC and interprets it according to the *ktype*. The relevant data can the be accessed either as array or, where applicable, directly be rendered to a plot using [matplotlib](https://matplotlib.org/), which can then be edited further. + + +### Acoustic hydrophone data +#### Data generation +Acoustic data aquisition as described in the [the sea science section](SeaScience.md#acoustic-data) offers a continuous data stream of digitized acoustic data that will undergo a filtering process according to the scientific target of the audio data. At this point, the raw acoustic data before filtering can be offered as example data and to researchers interested in sea science. Snippets of acoustic data with a duration of a few minutes are produced at a fixed interval and directly offered, after format conversion, from a data server integrated in the acoustic data acquisitioning system and made accessible through a REST API. Integrating this data stream in the open science system therefore offers a good example on demonstrating the use of a data stream offered externally to the ODC and with a growing number of individually data sets. + +#### Data description + +**Scientific use** +The hydrophone data can be used, after triggering and filtering, for acoustic neutrino detection, detector positioning calibration and identification of marine acoustic signals, e.g. originating from whales. In the unfiltered form, the acoustic data might primarily be of interest for sea science. + +**Metadata** +* *Publication metadata* is added during record creation at the ODC +* *Instrumentation & data taking settings* are offered for each data package through a separate endpoint (/info) of the REST API. + +#### Technical specification + +##### Data structure & format +Each data package consists of the same audio data, recorded in custom binary format (raw), which is formatted to [wave](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) and [mp3](https://mpeg.chiariglione.org/) audio files. Additionally, statistical properties of the audio snipped are offered in a separate stream. + +| format | endpoint | description | return format | +| ------ | -------- | ----------- | ------------- | +| raw | /raw | custom binary format | application/km3net-acoustic | +| mp3 | /mp3 | mpeg encoded data | audio/mpeg | +| wave | /wav | wave format data | application/octet-stream | +| psd | /psd | array with mean, median, 75% and 95% quantile | application/json | + +##### Interfaces +For each file, a KM3OpenResource is registered in the ODC. All resources belonging to the same data type are grouped using the [KM3ResourceStream](Datamodels.md#datamodels) as metadata class, pointing to all resources of the data stream through the kid unique identifier. All streams belonging to the acoustic data service are grouped as KM3ResourceCollection. Thus, +each single resource can be addressed as well as the logical connection between the resources preserved. + +The data is directly accessible through the ODC webpage views or using openkm3 as client from a python interface. diff --git a/portal/content/articles/experts.md b/portal/content/articles/experts.md new file mode 100644 index 0000000000000000000000000000000000000000..d398654847099a8c4bb1ecaef75623d1eea7026c --- /dev/null +++ b/portal/content/articles/experts.md @@ -0,0 +1,255 @@ ++++ +date = "2020-10-27T14:53:59Z" +title = "Experts Corner" +type = "article" +draft = true ++++ + +--- +* The FAIR principles <FAIR> +* Data models +* Data servers + * VOserver + * KM3NeT server +* Repositories +* Quality assessment <Quality> +* [Copyright and Licensing](https://open-data.pages.km3net.de/licensing/) +--- + +# Expert Corner + +## Making data FAIR + +### Requirements for FAIR data +The widely-accepted paradigm for open science data publication requires the implementation of the [FAIR principles](https://www.go-fair.org/fair-principles/) for research data. This involves the + +* definition of descriptive and standardized **metadata** and application of persistent identifiers to create a transparent and self-descriptive data regime, +* Interlinking this data to common science platforms and **registries** to increase findability, +* possibility to harvest from the data through commonly implemented **interfaces** and +* definition of a **policy standard** including licensing and access rights management. + +In all these fields, the standards of KM3NeT are currently developing. In this development process, the application of existing standards especially from the astrophysics community, the development of dedicated KM3NeT software solutions and the integration of the KM3NeT efforts developed during the [KM3NeT-INFRADEV](https://www.km3net.org/km3net-infradev/) are integrated into the [ESCAPE project](https://projectescape.eu), which forms the main development environments for open data publication in KM3NeT. + +### Compliance with the FAIR principles +The FAIR principles provide a solid set of requirements for the development of an open data regime. Following the FAIR requirements, the following solutions have been established in KM3NeT to enable FAIR data sharing and open science. + +#### Findable data +* **[Unique identifiers](pages/Datamodels.md#identifiers-and-content-description)** have been defined for digital objects within KM3NeT, including data files, software, collections and workflow steps, as well as identifiers for relevant data sets like particle detection ("event") in the detector. +* At the publication level, extended **[metadata sets](pages/Datamodels.md#resource-description)** are assigned to each published data product. +* The datasets can both be accessed via UID directly on the data servers as well as through external community-relevant **[repositories](pages/Repositories.md)**. + +#### Accessible data +* The data can, at this point, be directly accessed via a webpage and through a REST-API where data cannot be offered through VO protocols. +* At this point, **no authentication** is implemented, although in the future an authentication scheme is aimed for to allow access to unpublished data sets for a associated scientists. +* Records will be kept and transfer to long-term data repositories for high-level data sets is envisioned for archiving. + +#### Interoperable data +* Vocabularies and content descriptors are introduced that draw on external standards like VO standards or W3C standards where possible. +* **[Documentation on the metadata](pages/Datamodels.md)** and vocabularies are provided. +* Metadata classes are well-connected to allow the cross-referencing between different digital objects and extended metadata. + +#### Reusable data +* **[Licensing standards](https://open-data.pages.km3net.de/licensing/)** for data, software and supplementary material have been introduced. +* Basic **provenance** information is provided with the data, which serves as a development start point to propagate provenance management through the complex data processing workflow in the future. + + + +## Datamodels + +Metadata definition lies at the core of FAIR data, as it governs both the understanding of the data and as well as the interoperablity through access protocols. While some software can be used almost as-is, especially regarding the well-developed interfaces in the Virtual Observatory, the different data types and science fields that KM3NeT can link into requires a flexible approach and diverse application of software. In order to meet these various requirements, metadata and class definitions are developed within KM3NeT, drawing on well established standards e.g. of the [W3 Consortium](https://www.w3.org/), scientific repositories or the IVOA standards. + +Data published via the KM3NeT Open Data Center (ODC) is annotated as KM3OpenResource, which includes basic metadata for resource content, accessibility and identification. As resources can be provided either as part of a collection, e.g. data set or multiple resources related to an analysis, or as part of a stream of similar objects, e.g. of alert data, resources are grouped in the server as KM3ResourceCollection or KM3ResourceStream to facitlitate findability. Further details on these first data classes are documented in a [developing Git project](https://open-data.pages.km3net.de/datamodels/note.html#basic-classes-for-data-publication). In the future, further classes will be introduced and adapted governing e.g. the scientific workflow as discussed in [the according section](Workflow.md). + +### Resource description + +The **KM3OpenResource** class serves as base class to describe any KM3NeT open resource, be it a plot, dataset or publication. The information gathered here should be easily transformable to publish the resource to repositories like the Virtual Observatory or Zenodo based on [DataCite](https://datacite.org/index.html). As resource description metadata is widely based on standardized formats like [the Dublin Core standard](http://dublincore.org/), the KM3Resource class picks the relevant entries from the resource metadata, including [the VO Observation Data Model Core Components](http://ivoa.net/documents/ObsCore/) regarding the metadata specific to the scientific target, and the [VOResource description](http://ivoa.net/documents/VOResource/) and [Zenodo resource description](https://developers.zenodo.org/) for general resource metadata. + +### Identifiers and content description + +Identifiers serve to uniquely address digital objects. While [Digital Object Identifiers (DOIs)](https://www.doi.org/hb.html) are of long-standing use in the scientific community, these public identifiers have to link to an KM3NeT-internal identification scheme which allows to back-track the data generation and link between various data products related to a scientific target or publication. In addition to this, an ordering schema for class definitions and content descriptors helps in the interpretation of a specific digital object. To this end, the **ktype** and **kid** have been introduced. + +#### kid +The kid is a unique identifier which follows the [uuid schema](https://pubs.opengroup.org/onlinepubs/9629399/apdxa.htm). The uuid is ideally assigned at the generation of the digital object where possible and stored in the metadata set or header of the digital object. It is the goal to use kid assigment at all steps of data processing and has been implemented for all open science products. + +#### ktype +The **ktype** serves as a content descriptor and is defined as a string with a controlled vocabulary of words separated by ".", starting with "km3.". The selected vocabulary comprises domain names, class and sub-class names and, in some cases, identifiers for class instances, like + +``` +km3.{domain}.{subdomains}.{class}.{subclasses}.{instance} +``` + +e.g. "km3.data.d3.optic.events.simulation" for a data set of processed optic event data (data level d3) from Monte Carlo simulation, indicating a file class, or "km3.params.physics.event.reco.reconame.E" indicating the parameter definition of the reconstructed energy of particle events from a reconstruction algorithm named "reconame". + +#### Particle event identifiers +For various elements of data taking, identifiers are used to uniquely label e.g. different settings of software and hardware or annotate data streams. At the data aggregation level, an identifier therefore has to be introduced to uniquely identify a particle detection in one of the KM3NeT detectors. + +Due to the design of the data acquisition process, these events can be uniquely identified by + +* the detector in which they were measured, assigned a **detector id**, +* the run, i.e. data taking period during which it was detected, assigned a **run id**, +* the **frame index**, indicating the numbering of the data processing package in the DAQ system on which the triggering algorithms are performed and +* the **trigger counter**, i.e. the number of successes of the application of the set trigger algorithms. + +The internal KM3NeT event identifier is therefore defined as + +``` +km3.{detector_id}.{run_id}.{frame_index}.{trigger_counter} +``` + +## Platforms and Servers + +### The Virtual Observatory server + +The Virtual Observatory environment facilitates data publication and sharing among astrophysics community according to the standards set by the [IVOA](https://ivoa.net/). The VO protocols implement the FAIR data principles and ensure an environment for better scientific use and public access to data from astrophysical experiments. + +Dedicated software and exchange protocols are set up at participating data centers and tailored user programs grant easy user access to the diverse data through standardized labeling and description of the data. The KM3NeT collaboration is a data provider to the VO and operates a data server at [http://vo.km3net.de/](http://vo.km3net.de/), running the (DaCHS software)[http://docs.g-vo.org/DaCHS/]. + +#### Implementation + +**Software package** + +The [GAVO Data Center Helper Suite (DaCHS)](https://dachs-doc.readthedocs.io/index.html) is a "publishing infrastructure for the Virtual Observatory, including a flexible component for ingesting and mapping data, integrated metadata handling with a publishing registry, and support for many VO protocols and standards". It can be applied for KM3NeT purposes "as is" and allows to various types of data and formats to the VO. + +**Standards for metadata** +Entries to the VO registry are annotated using the [Resource Metadata standards](https://www.ivoa.net/documents/RM/20070302/index.html) of the IVOA. This metadata is required for publication. In the KM3NeT open science system, the KM3OpenResource class used to label all resources contains also entries which can be matched to the VO metadata standards, facilitating an automatic casting of the standard KM3NeT internal data format to the VO metadata. + +#### Data conversion and upload + +**Format conversion** + +In order to transform event-based data to a VO-compatible format, [standard scripts](https://git.km3net.de/open-data/voserver) have been set up to convert [neutrino event tables](Dataformats.md#particle-event-tables) and the according metadata into VO-compatible format and add the required metadata to the data set. To publish a dataset, the procedure includes + +* labeling the data set according to VO standards with information about origin, +authorship, parameter formats and standardized event identifiers. In the DaCHS software, this is handled through a resource description file, to which information from the KM3OpenResource description is cast. +* Selection of the service interface, i.e. the protocols which are offered through the various endpoints of the server to the Virtual Observatory. This interface is also defined in the resource description. + +**Publication procedure** + +Publication involves uploading and listing the data set as available in the registry of the server, which is handled through simple [administration commands]() in DaCHS. + +#### Interlink to registries + +In order to declare the entries of the local server to the [IVOA Registry of Registries](http://rofr.ivoa.net/), the KM3NeT collaboration registered the server, obtaining a IVOA identifier (ivo://km3net.org) used in the VO context to + +#### Example data + +As KM3NeT does not produce astrophysics data yet, an alternative data sample from the [ANTARES experiment](https://antares.in2p3.fr/) was used to set up the system. The ANTARES collaboration has already published two data sets to the VO by using the services of the German Astrophysical Virtual Observatory (GAVO) which runs the DaCHS software and is maintainer of it. The most recent public data sample was only made available through the ANTARES website and thus did not match the FAIR criteria. Using the [ANTARES 2007-2017 point source neutrino sample](Usecase_ANTARES.md), the KM3NeT VO server could be registered to the VO and hosts this data set as example data. + +#### Data access +Tabulated high-level neutrino event data can be accessed utilizing access protocols like the Table Access Protocol (TAP) and query languages like the Astronomical Data Query Language (ADQL). To query these data sets related to astronomical sources, the Simple Cone Search (SCS) protocol allows to pick specific events according to particle source direction. + + +### The KM3NeT Open Data Center + +For all data not publishable through the IVOA, the KM3NeT Open Data Centre is serving as interface and/or server to the data. For the setup of this server, the [Django framework](https://www.djangoproject.com/) was used. + +#### Implementation + +**Software package** + +Django is a python-based free and open-source web framework that follows the model-template-views (MTV) architectural pattern. Models implement e.g. the KM3OpenResource description, which serves as basic data description class. Templates allow the display of the information through views, which are accessible via webbrowser. In addition to this, REST API endpoints can be defined to allow managing and querying the data through web requests. It also offers an admin interface which allows manual adaption of the data via the GUI, and media storage which can serve to hold smaller data sets, along with a broad gallery of functions for url handling, testing, model description etc. + +**Standards for metadata** + +For the django server, which primarily serves as an interface to hold metadata for open science products, the metadata can be freely defined, and will develop with time depending on the requirements of the science communities the data is offered to. At this point, the models used by the server consist of the [following three classes](https://open-data.pages.km3net.de/datamodels/classdefs_overview.html): + +* **KM3OpenResource** describes each registered data element, independent of the actual type of data. It holds metadata on the publication, the content and link to a description of the content, various optional identifiers like DOI in addition to the km3net identifier (kid), a link to the storage location of the actual data and metadata on data acess. + +* **KM3ResourceCollection** holds and describes links to several KM3OpenResources via their kid, and adds information on the collection. Resources of different kinds can be combined here, indicating e.g. that they belong to the same research object, e.g. the same analysis. + +* A **KM3ResourceStream** is used to group resources of the same type in a collection that can be extended by new resources by time. These resources are automatically updated from the defined urls. + +#### Data conversion and upload + +**Format conversion** +The conversion of the data to be uploaded to the server or simply registered as KM3OpenResource was described in the section on [data sets](https://git.km3net.de/-/ide/project/open-data/openscienceportal/blob/master/-/pages/Dataformats.md#open-data-sets-and-formats). + +**Publication procedure** +The publication procedure involves upload of the data to the server, and adding the resource description to the database. If the data product does not yet have a kid assigned, a kid is added to the resource. With this step, the data is made publicly available. + +#### Additional features +The ODC will also be used to store metadata indepent of the publication of the data. It includes e.g. a kid search endpoint to allow to draw the metadata on any digital object registered with the server. This will allow to collect information e.g. on storage location of files and their descriptions. This serves as an interim solution to further the development of e.g. a full workflow management scheme. + +## Registries and Archiving + +Registering the KM3NeT open science products with well-used platforms and assigning global identifiers is a key to findability of the data. As several KM3NeT servers (VO server, Open Data Center, Gitlab server) currently hold key data and software, products from these servers must be linked to larger platforms, ideally through semi-automatic integation. +Also, at the end of the experiments, these products must be transferable from KM3NeT-hosted platforms to external repositories that ensure a longer lifetime of the digital products. + +### Virtual Observatory Registry of Registries + +#### Registry structure +In the VO, the KM3NeT server is registered as a registry of resources, i.e. of the datasets and services offered by the server. The [IVOA Registry of Registries (RofR)](https://ivoa.net/documents/Notes/RegistryOfRegistries/index.html) is a service maintained by IVOA that provides a mechanism for IVOA-compliant registries to learn about each other, being itself a compliant publishing registry which contains copies of the resource descriptions for all IVOA Registries. +When a resource metadata harvester harvests from these publishing registries, they can discover all published VO resources around the world. + +#### Identifiers +The KM3NeT VO server is registered with the following metadata to the RofR: + +* name: KM3NeT Open Data Registry +* IVOA Identifier: `ivo://km3net.org/__system__/services/registry` +* OAI service endpoint: [http://vo.km3net.de/oai.xml](http://vo.km3net.de/oai.xml) + +With this registration, KM3NeT data in the VO is fully findable within the Virtual Observatory, and each resource is identifiable through the naming of the individual endpoint of the service within the registry. + +#### Archiving +At the end of the operation of KM3NeT and, in case the server can no longer be operated, various national organizations in the KM3NeT member states offer long-term supported repositories, to which the data sets could be transfered. As example, the [GaVO Data Center](http://dc.zah.uni-heidelberg.de/) provided by *Zentrum für Astronomie Heidelberg* on behalf of the German Astrophysical Virtual Observatory already hosts other ANTARES data sets and is provider of the DaCHS software used in the KM3NeT VO server, so transfer for archiving would, at least from the current day's perspective, be easy to achieve. + +### DataCite and Zenodo +Universal citation of data and digital objects is generally handled through assignment of a [Digital Object Identifier (DOI)](https://www.doi.org/), a persistent identifier or handle used to identify objects uniquely which standardized by the International Organization for Standardization (ISO). The VO does support the integration of a DOI in their resource metadata, but does not provide an authority to assign DOIs, as the organization assigning a DOI generally also has to host the data to ensure the longevity of the data resource. + +This leads to the dilemma that either the KM3NeT collaboration would have to become a member organization to e.g. [DataCite](https://datacite.org/), a global non-profit organisation that allows the assignment of DOIs to its member organizations, and operate a repository, or copy or mirror data to another repository in order to get a DOI. While this issue is not yet resolved and will be investigated further also in the context of the [ESCAPE project](https://projectescape.eu/), where basic considerations are taken with respect to the [EOSC](https://ec.europa.eu/info/publications/persistent-identifier-pid-policy-european-open-science-cloud_en), currently the second option is chosen for the small amount of data that is currently provided in this demonstrator. As hosting repository, [Zenodo](https://zenodo.org/) was chosen as well-established data repository in the physics community. + +#### Registry structure +Zenodo was initiated by the [OpenAIRE project](https://www.openaire.eu/) and is hosted by CERN to offer a respository for research funded by the European Commission. It hosts publications, audio and visual media, software and datasets. It allows to create "communities" to group various resources, which has been used to create a [KM3NeT community](https://zenodo.org/communities/?p=KM3NeT). Upload can be managed through an API as well as through a web interface and standardized metadata is required. + +#### Identifiers +Zenodo assigns DOIs to resources on upload if a DOI cannot be provided, making the dataset also easily citable. For this demonstrator, the event sample from the KM3NeT ORCA use case ("one week of ORCA") will be registered with Zenodo. + +#### Archiving +As Zenodo offers long-term support of its resources and the data is already mirrored to the repository, archiving of the data can easily be accomplished. + +### Software integration + +#### Github, Zenodo and pypi +With [Github](https://github.com), a major platform exists for software development that also allows easy interaction between software developers on various projects. Open KM3NeT software is mirrored to Github, making the software findable for software developers. For grouping of the software, a [KM3NeT collection](https://github.com/KM3NeT) has also been established here. + +For python-based software, easy installation via the pip package installer is integrated to the software packages. This installer links to the [Python Package Index (PyPI)](https://pypi.org/), a repository of software for the Python programming language. Here, a [KM3NeT user account](https://pypi.org/user/km3net/) has been established to group and administrate the software. + +#### Identifiers +Neither Github nor PyPI make software citable in the strict sense, as they do not assign DOIs. However, Zenodo allows the integration of Github repositores to their platform. Therefore, the combination of mirroring software to Github, making it installable via PyPI and registering it with Zenodo makes the software accessible for community development, easily installable for users and citable in a scientific context. + +#### Archiving +Both PyPI and Zenodo copy the relevant software to their platforms, copies of the source code are stored at multiple sites and make archiving easy. + +## Data Quality assessment + +The processes involved in the KM3NeT data processing chain can be grouped into a few main categories. Although the ordering of these categories is not strictly hierarchycal from the point of view of data generation and processing, in the context of a general discussion one could safely assume that a hierarchycal relation exists between them. From bottom to top, these categories would be data acquisition, detector calibration, event reconstrucion, simulations and finally scientific analyses based on the data processed in the previous categories. The quality of the scientific results produced by KM3NeT will be affected by the performance of the different processes involved in the lower levels of the data processing chain. In order to implement a complete and consistent set of data quality control procedures that span the whole data processing chain, it is required to have a complete, unambiguous and documented strategy for data processing at each of the aforementioned process categories. This includes the setting of data quality criteria which should be initiated at the highest level of the data processing chain, and propagated towards the lowest levels. For each of the aforementioned categories there exists a working group within the KM3NeT collaboration. It therefore corresponds to each of these working groups to develop the working group objectives and procedures according to the scientific needs of KM3NeT. Currently such a documented strategy does not exist for any working group. It has therefore not been possible to develop a full strategy for data quality control. Nevertheless, there have been copious software developments devoted to quality control along the different stages of the data processing chain. In the following, a description of some of the existing quality control tools and procedures is given. This description could be conceived as an incomplete prototype for a data quality plan. The implementation of these procedures into an automated workflow requires the design and implementation of a standardised data processing workflow which meets software quality standards. This does not exist either. Some of the figures and results shown here have been produced ad-hoc, and not as a result of any working system. + +### Data quality control procedures + +#### Online Monitor + +During the data acquisition process, the online monitoring software presents real time plots that allow the shifters to promptly identify problems with the data acquisition. It includes an alert system that sends notifications to the shifters if during the data taking, problems appear that require human intervention. The online monitor uses the same data that are stored for offline analyses (this is actually not true, and should be changed). This implies that any anomaly observed during the detector operation can be reproduced offline. {HERE, A FIGURE COULD BE GIVEN AS AN EXAMPLE. FOR INSTANCE, THE MESSAGES ON THE CHAT WHEN THE TRIGGER RATE IS 0.} + +#### Detector Operation + +As explained in XX.YY, the optical data obtained from the detector operation are stored in ROOT files and moved to a high performance storage environment. The offline data quality control procedures start with a first analysis of these files which is performed daily. It mainly focuses on but is not restricted to the summary data stored in the ROOT files. The summary data contain information related to the performance of the data acquisition procedures for each optical module in the detector. As a result of this first analysis, a set of key-value pairs is produced where each key corresponds to a parameter that represents a given dimension of data quality and the value represents the evaluation of this parameter for the livetime of the analysed data. The results are tagged with a unique identifier corresponding to the analysed data set and uploaded to the database. In the present implementation the analysis is performed for each available file where each file corresponds to a data taking run, although this may change in the future as the data volume generated per run will increase with the detector size. + +A further analysis of the results stored in the database includes the comparison of the values of the different parameters to some reference values, allowing for a classification of data periods according to their quality. The reference values are typically set according to the accuracy with which the current detector simulations include the different quality parameters. In addition, the evolution of the different quality parameters can be monitored and made available to the full collaboration as reports. Currently this is done every week by the shifters, and the reports are posted on an electronic log book (ELOG). Figure {FIG}, shows an example of the time evolution for two quality parameters during the period corresponding to the data sample that are provided together with this report. The selected runs correspond to a period of stable rates and during which the different quality parameters were within the allowed tolerance. + +#### Calibration + +The first step in the data processing chain is to determine the detector calibration parameters using the data obtained from the detector operation. These parametrers include the time offsets of the PMTs as well as their gains and efficiencies, the positions and orientations of the optical modules. The PMT time offsets and the positions of the optical modules are used in later stages of the data processing chain for event reconstruction, as well as by the real time data filter during the detector operation. While the event reconstruction requires an accurate knowledge of these parameters, the algorithms used by the real time data filter depend rather losely on them, and its performance is not dependent on variations occuring within a timescale of the order of months. Nevertheless, it is still necessary to monitor them and correct the values used by the data filter if necessary. The performance of the detector operation also depends on the response of the PMTs, which is partly determined by their gains. These evolve over time, and they can be set to their nominal values through a tuning of the high-voltage applied to each PMT. Monitoring the PMT gains is therefore also necessary to maximise the detector performance. Additionally, the PMT gains and efficiencies are also used offline by the detector simulation. Within the context of data quality assesment, software tools have been developed by KM3NeT that allow to monitor the parameters described above and to compare them to reference values, raising alerts when necessary. The reference values should be determined by the impact of miscalibrations on the scientific goals of KM3NeT and this work has not been adressed. The arrangement of these tools into a workflow requires the ellaboration of an underlying calibration strategy. This has not been done, and the work is therefore on hold. + +#### Simulations and Event reconstruction + +Once the calibration constnats have been determined, the data processing chain continues with the event reconstruction, and with the simulation and reconstruction of an equivalent set of events where the information in the summary data is used to simulate the data taking conditions. The simulation of particle interactions and propagation is done by dedicated software, while the detector simulation and event reconstruction is done by Jpp. As a result of the simulation chain, a ROOT file is obtained which has the same format as the ROOT file produced by the data acquisition system. This contains events obtained after the simulation of the detector trigger. The resulting file and the corresponding data taking file, are identically processed by the reconstruction software, which produces ROOT formatted files with the result of reconstructing the real data events and the simulated events respectively. The comparison between data and simulations is an important parameter to measure the quality of the data and it can be done at trigger level, and at reconstruction level. In both cases, the comparison follows the same strategy: the root files are used to produce histograms of different observables and these histograms are saved into new ROOT files. A set of libraries and applications devoted to histogram comparisons have been developped in Jpp. These implement multiple statistical tests that can be used to determine if two histograms are compatible, as well as the degree of incompatibility between them. Additionally, tools have been developed that allow to summarise the reuslts into a number per file, which represents the average result after comparing all the observables. For the example provided here, the discrepancy between data and montecarlo is measured through the calculation of the reduced chi2 for each observable, and the summary is given as the average reduced chi2 of all the compared observables for each file. Figures {X} and {Y} show the value of this parameter as a function of the run number. + +These tools would also allow for the calculation of other data quality metrics by comparing the data with figures of merit for different observables. For this, the development of plans and strategies mentioned in the introduction of this section is necessary. + +The contents in the files produced by the reconstruction routines are the main ingredients for the physics analyses, though these analyses are typically done with dedicated software frameworks which require a special formatting of the data. The data format used for physics analyses is called 'aanet' format, which is also based in ROOT. Control mechanisms are thus needed to ensure consistency between the files produced by Jpp, and the aanet formatted files. Consistency between Jpp and aanet files can be verified up to a certain level by producing distributions of the different parameters and by verifying that these distributions are identical. + + + +## Copyright and Licensing + +Find more infos [here](https://open-data.pages.km3net.de/licensing/) diff --git a/portal/content/articles/science.md b/portal/content/articles/science.md new file mode 100644 index 0000000000000000000000000000000000000000..2590a766497f8c68a71d0ff149a789f00b3ccb58 --- /dev/null +++ b/portal/content/articles/science.md @@ -0,0 +1,163 @@ ++++ +date = "2020-10-27T14:53:59Z" +title = "Science" +type = "article" +draft = true ++++ + +--- +* Science Targets + * Multimessenger + * Sea Science +* Detector +* Simulation + +--- + +# Our Science + +## Scientific Targets + +The KM3NeT neutrino detectors will continuously register neutrinos from the whole sky. The neutrinos of astrophysical interest, i.e. those from extra-terrestrial origin, need to be identified in the background of atmospheric neutrinos, i.e. those created in Earth’s atmosphere by interactions of cosmic-ray particles. Access to cosmic neutrino data is of high importance for a wide astrophysics community to relate cosmic neutrino fluxes to observations by other neutrino observatories or using [other messengers](Multimessengers.md), and to compare them with theoretical predictions. The atmospheric neutrinos carry information on the particle physics processes in which they are created and on the neutrinos themselves. These data are relevant for a wide astroparticle and particle physics community. Finally, KM3NeT will monitor marine parameters, such as bioluminescence, currents, water properties and transient acoustic signals and provides user ports for Earth and Sea sciences. + + +### Astro Physics + +The main science objective of KM3NeT/ARCA is the detection of high-energy neutrinos of cosmic origin. Neutrinos represent an alternative to photons and cosmic rays to explore the high-energy Universe. Neutrinos can emerge from dense objects and travel large distances, without being deflected by magnetic fields or interacting with radiation and matter. Thus, even modest numbers of detected neutrinos can be of utmost scientific relevance, by indicating the astrophysical objects in which cosmic rays are accelerated, or pointing to places where dark matter particles annihilate or decay. + +The detector design of the KM3NeT/ARCA has been optimised to target astrophysical neutrinos at TeV energies and above in order to maximise the sensitivity to detect neutrinos from the cosmic ray accelerators in our Galaxy. In a neutrino telescope like ARCA, two main event topologies can be identified: Firstly, the 'track' topology indicates the presence of muons produced in charged current muon neutrino interactions and tau neutrino interactions with muonic tau decays. Muons are the only class of particles that can be confidently identified, because they are the only particles that appear as tracks in the detector. Secondly, the shower topology refers to a point-like particle shower from neutral current interactions of all three neutrino flavours, the charge current interactions of electron neutrino and tau neutrino interactions with non-muonic tau decays. For tau neutrinos at sufficiently high energies (E > 100TeV), the produced tau lepton can travel several metres before decaying, resulting in distinguishable two individual showers. This allows identification of tau neutrinos with a clear signature of the flavour of the neutrino primaries. All neutrino flavours can be used for neutrino astronomy. + +The preferred search strategy is to identify upward-moving tracks, which unambiguously indicates neutrino reactions since only neutrinos can traverse the Earth without being absorbed. A neutrino telescope in the Mediterranean Sea on the Northern hemisphere of the Earth is well suited for this purpose, since most of the potential Galactic sources are in the Southern sky. + +Besides all-favour neutrino astronomy, i.e. investigating high-energy cosmic neutrinos and identifying their astrophysical sources, additional physics topics of ARCA include + +- [multi-messenger studies](Multimessengers.md) +- particle physics with atmospheric muons and neutrinos +- indirect searches for dark matter + +The ARCA detector allows to reconstruct the arrival direction of TeV-PeV neutrinos to sub-degree resolution for track-like events and ~2 degree for shower-like events. The energy resolution is about ~0.27 in log10(E_\mu) for muons above 10TeV, while for showers a ~5% resolution on the visible energy is achieved. In order to achieve these resolutions, typically a set of quality selection criteria are applied based on the output of the event reconstructions. + +Further details on the detector performance can be found [here](https://iopscience.iop.org/article/10.1088/0954-3899/43/8/084001). + + +### Neutrino Physics + +Neutrinos have the peculiar feature that they can change from one flavour to another when propagating over macroscopic distances. This phenomenon of neutrino flavour change is known as 'neutrino oscillation'. The Nobel Prize in Physics of the year 2015 was awarded to T. Kajita and A. B. McDonald for the discovery of neutrino oscillations, which shows that neutrinos have mass [1]. One open question is the so-called 'neutrino mass ordering'. It refers to the sign of one of the two independent neutrino mass differences, the absolute value of which has already been known for more than two decades. + +The main science objective of KM3NeT/ORCA is the determination of the ordering of the three neutrino mass eigenstates by measuring the oscillation pattern of atmospheric neutrinos. Atmospheric neutrinos are produced in cosmic-ray air-showers in the Earth atmosphere. When produced on the other side of the Earth and traversing the Earth towards the detector, atmospheric neutrinos oscillate, ie. change their flavour between production and detection. The oscillation pattern in the few-GeV energy range is sensitive to the neutrino mass ordering and other oscillation parameters. + +Besides determining the neutrino mass ordering, additional science topics of ORCA include: + +- testing the unitary of the neutrino mixing matrix by studying tau-neutrino appearance +- indirect searches for sterile neutrinos, non-standard interactions and other exotic physics +- indirect searches for dark matter; testing the chemical composition of the Earth's core (Earth tomography) +- low-energy neutrino astrophysics + +The detector design of the KM3NeT/ORCA has been optimised for atmospheric neutrinos in the 1-100GeV energy range in order to maximise the sensitivity of determining the neutrino mass ordering. For neutrino oscillation measurements with KM3NeT/ORCA, the capability to differentiate the two event topologies -- i.e. the track-shower separation power -- is very important. + +The detector performance of the ORCA detector is summarised in detail [here](https://iopscience.iop.org/article/10.1088/0954-3899/43/8/084001) + + +### Multi-Messenger Astrophysics + +The multi-messenger approach in astrophysics means looking for at least two or more cosmic messenger particles to study the transient phenomena in our Universe, such as gamma-ray burst, the outburst of active galactic nuclei, fast radio burst, supernova explosion, etc. Using multiple messengers greatly extends our understanding of the Universe compared to using one single channel. The cosmic messengers include electromagnetic waves, cosmic rays, gravitational waves, and neutrinos. + +Some of the most important open questions in astrophysics are the origin of astrophysical neutrinos, the origin of cosmic rays, acceleration mechanics of high energy cosmic rays, etc. Multi-messenger studies can help answer these questions. + +Up to now, there are three successful multi-messenger detections: + +1) In 1987, the observation of the supernova 1987A, where neutrinos are observed in neutrino experiments about 2 or 3 hours before the visual observations. +2) 30 years later in 2017, the observation of the gravitational wave and the electromagnetic observations of the gamma-ray burst observed by Fermi and Integral. +3) TXS 0506, where for the first time, blazars are identified as a neutrino source. + +#### Importance of Neutrinos for Multi-Messenger Studies + +Among those multiple messengers, neutrinos are an important type of messenger. Neutrinos are neutral and only interact via gravity and weak interactions. Neutrinos point back to their sources where they were created. + +For example, cosmic rays are charged particles, thus they are deflected by the galactic magnetic fields. Cosmic ray observatories can detect them but their observed arrival directions do not point back to their sources. During the propagation of cosmic rays, neutrinos are produced during the interaction of cosmic rays and the extragalactic background light. Since neutrinos are not bent by magnetic fields, they can act as good tracers for studying the propagation of cosmic rays. + +Looking for coincidences of neutrinos and electromagnetic or GW counterparts may also reveal subthreshold events that otherwise do not generate interest within each single observatory, or even reveal new sources. + +Because neutrinos travel with nearly the speed of the light, a real-time or near real-time alert system based on neutrinos (with a good angular resolution) is possible. It is vital for follow-ups for some high energy transient sources that are time-dependent with the flux quickly varying. For example, a real-time neutrino alert will be able to point to a direction for space electromagnetic observatories that have a small sky coverage (e.g. Fermi-LAT) to conduct their search in a timely fashion. + +#### KM3NeT Multi-Messenger Neutrino Alerts + +With the Southern sky including the Galactic Center in view and a good angular resolution, KM3NeT will contribute greatly to the multi-messenger community. For a detailed description of the detector, see [the detector page](Detector.md). + + +* For the search of neutrinos via event reconstructions based on (causally coincident) lit-up DOMs, focusing on high energy neutrinos, such as astrophysical neutrinos, KM3NeT will be able to send/receive alerts from/to the multi-messenger community: + + 1. to receive external alerts, i.e. alerts generated by external partner experiments (e.g. gravitational waves alerts from LIGO/Virgo, neutrino alerts from other neutrino experiments) via [GCN](https://gcn.gsfc.nasa.gov/) and search for correlated neutrinos in KM3NeT. + 2. to send neutrino alerts (to GCN) that are observed in KM3NeT, including multiplet alerts, possible astrophysical neutrinos, any correlated neutrinos is found in the above mentioned correlation search. The alerts will be used for external partner experiments to conduct their correlation search/follow-up. + + +* For the search of MeV core-collapse supernova neutrinos, each KM3NeT DOM acts as a detector. A CCSN neutrino interaction leads to higher counting rates of individual PMTs and an increase of the number of coincident lit-up PMTs in the same optical module. Its search entails a different method from the usual neutrino event reconstruction route, and it has a separate alert system called the [SuperNova Early Warning System](https://snews.bnl.gov/), so the KM3NeT supernova alerts are not discussed in here. KM3NeT is already connected to SNEWS. + +#### KM3NeT Alert Types + +The alert types include: + +* MeV Core-Collapse Supernova alerts (SNEWS), already online currently. +* Multiplet alerts: Multiple neutrinos from one source within some time window (this suggests a potential neutrino source). +* High-Energy Neutrino alerts. Potential neutrinos from astrophysical sources (The higher the energy, the higher the probability of it being of astrophysical origin). +* Any neutrinos correlated with external alerts. +* Other alerts to be defined, or more subcategories divided from the High-Energy Neutrino alerts if necessary (e.g. track HE, cascade HE alerts). + +For alert data formatting and sending, see [the dataformat definition](Dataformats.md#multimessenger-alerts). + +### Sea Science + +#### Environmental data + +The KM3NeT research infrastructure will also house instrumentation for Earth and Sea sciences for long-term and on-line monitoring of the deep-sea environment. Until now, measurements in the deep sea are typically performed by deploying and recovering autonomous devices that record data over periods of months to years. This method is severely constrained by bandwidth limitations, by the absence of real-time interaction with the measurement devices and by the delayed access to the data. A cabled deep-sea marine observatory, like KM3NeT, remedies these disadvantages by essentially providing a power socket and high bandwidth Ethernet connection at the bottom of the sea. This is an important and unique opportunity for performing deep-sea research in the fields of marine biology, oceanography, environmental sciences and geosciences. To this end, both the French and Italian KM3NeT sites are nodes of the European Multidisciplinary Seafloor and water column Observatory [EMSO](http://emso.eu). + +EMSO sea science instrumentation modules will host sensors that provide real-time monitoring of a plethora of environmental parameters including temperature, pressure, conductivity, oxygen concentration, turbidity and sea current. Additional instrumentation including a benthic crawler, a seismograph, a deep-sea Germanium gamma detector and a high-speed, single-photon [video camera for bioluminescence studies](https://www.appec.org/news/km3net-is-growing-recent-deployment-in-the-mediterranean-sea) will also be installed. Furthermore, the KM3NeT optical modules themselves provide invaluable data on deep-sea bioluminescence and bioacoustic monitoring of the local cetacean populations. For example, acoustic signals from whales and dolphins can be detected with the [acoustic sensors](https://www.researchitaly.it/en/projects/big-electronic-ears-spy-on-sperm-whales-in-the-mediterranean-sea/). An other example is the possibility to use the optical fibres in the main electro-optical cables, that run for many tens of kilometre along the seafloor, for [seismological studies](https://eartharxiv.org/ekrfy). + +#### Acoustic data + +The telescope is also equipped with acoustic sensors: 1 piezo-electric ceramic sensor is installed in each DOM, 1 hydrophone on each DU base and on the junction boxes of the seafloor network. Main purpose of these sensors is to provide real time positioning of each DOM with cm accuracy. The hydrophone sensitivity (ominidirectional) is about -173 dB re 1V/uPa over the frequency band between few tens Hz and 70 kHz, that makes this sensor also suitable for interdisciplinary studies. + +The default KM3NeT DAQ has been designed to identify only the (known) acoustic signals emitted by a long baseline of acoustic sensors among the full data stream (Figure 2). A further implementation of the DAQ isforeseen in the future, permitting on-line analysis, display and recording of the underwater noise spectrum through a dedicated channel. A large number of use cases has been identified, as described [in this document](https://www.km3net.org/wp-content/uploads/2020/02/D8.1-KM3NeT-rules-of-access-for-external-users.pdf). + +![Acoustic data acquisition system, from [2]](/figures/Acoustic_data.png) + +At this stage, the unfiltered acoustic data can be made available directly from the ADF, which is provided in several formats through a REST-API on a data server integrated in the acoustic data processing system of KM3NeT. + +## Detector + +The KM3NeT Research Infrastructure will consist of a network of deep-sea neutrino detectors in the Mediterranean Sea with user ports for Earth and Sea sciences. + +The KM3NeT neutrino detectors employ the same technology and neutrino detection principle, namely a three-dimensional array of photosensors that is used to detect Cherenkov light produced by relativistic particles emerging from neutrino interactions. From the arrival time of the Cherenkov photons (~nanosecond precision) and the position of the sensors (~10cm precision), the energy and direction of the incoming neutrino, as well as other parameters of the neutrino interaction, can be reconstructed. The main difference between different detector designs are the density of photosensors, which is optimised for the study of neutrinos in the few-GeV (ORCA) and TeV-PeV energy range (ARCA), respectively. + +A key technology of the KM3NeT detectors is the Digital Optical Module (DOM), a pressure-resistant glass sphere housing 31 small 3-inch photo-multiplier tubes (PMTs), their associated electronics and calibration devices. The segmented photo-cathode of the multi-PMT design allows for uniform angular coverage, single-photon counting capabilities and directional information on the photon arrival direction. The DOMs are distributed in space along flexible strings, one end of which is fixed to the sea floor and the other end is held close to vertical by a submerged buoy. Each string comprises 18 DOMs. The strings are connected to junction boxes that provide connections for power and data transmission. + +A collection of 115 strings forms a single KM3NeT building block. The modular design allows building blocks with different spacings between strings/DOMs, in order to target different neutrino energies. In the KM3NeT Phase-2.0, three building blocks are foreseen: two KM3NeT/ARCA blocks, with a large spacing to target astrophysical neutrinos at TeV energies and above; and one KM3NeT/ORCA block, to target atmospheric neutrinos in the few-GeV range. + +The ARCA (Astroparticle Research with Cosmics in the Abyss) detector is being installed at the KM3NeT-It site, 80km offshore the Sicilian coast offshore to Capo Passero (Italy) at a sea bottom depth of about 3450m. About 1 km^3 of seawater will be instrumented with ∼130000 PMTs. The ORCA (Oscillation Research with Cosmics in the Abyss) detector is being installed at the KM3NeT-Fr site, 40km offshore Toulon (France) at a sea bottom depth of about 2450m. A volume of about 8 Mton is instrumented with ∼65000 PMTs. + +Technical details on the detector design are given in [1]. + +### Data Acquisition + +The readout of the KM3NeT detector is based on the 'all-data-to-shore' concept, in which all analogue signals from the PMTs that pass a reference threshold are digitised. This data contain the time at which the analogue pulse crosses the threshold level, the time that the pulse remains above the threshold level (known as time-over-threshold, or ToT), and the PMT address. This is typically called a *hit*. All digital data (about 25 Gb/s per building block) are sent to a computing farm onshore where they are processed in real time. +The recorded data is dominated by optical background noise from Cherenkov light from K40 decays in the seawater as well as bioluminescence from luminescent organisms in the deep sea. Events of scientific interest are filtered from the background using designated software, which exploit the time-position correlations following from causality. To maintain all available information for the offline analyses, each event contains a snapshot of all the data in the detector during the event. + +For calibration purposes summary data is written out, containing the count rates of all PMTs in the detector (with a sampling frequency of 10Hz). This information is used in the simulations as well as in the reconstruction to take into account the actual status and optical background conditions of the detector. + +In parallel to the optical data, acoustic data and instrument data are recorded. The main purpose is position calibration of the DOMs, which is necessary as the detector elements can move under the influence of sea currents. The acoustic data includes the processed output from the piezo sensors in the DOMs and from the hydrophones in the base modules of the strings. The instrument data includes the processed output from the compasses, temperature sensors and humidity sensors inside the DOMs. + +During operation the continuous data stream sent by the detector is split into small time intervals, called *runs*, with typical durations of a few hours. This is done for practical reasons of the data acquisition. In addition, this procedure allows to selected a set of run periods with high-quality data based on the monitored detector status, environmental conditions and data quality. The calibration for timing, positioning and photon detection efficiency is done offline using the calibration data. + +## Detector and event simulations + +In order to evaluate and test physical models of neutrino interactions and productions, a huge variety of simulations need to be performed and compared to the data taken with the KM3NeT detectors. Several parts of the detector are modelled including photomultiplier tube characteristics, complex electronic components that process signals of those in sub-nanosecond time regimes, the high throughput data distribution over heterogeneous networks and also physical properties of the environment (like seawater, atmosphere...) and the materials used. All these are carefully taken into account when simulating the overall detector response of particle interactions. + +The first level of event simulation starts with the incoming primary particle: neutrinos produced in cosmic events or for example atmospheric neutrinos and muons produced by cosmic radiation in the Earth's atmosphere. These primary particles eventually trigger events in the detector after a usually large chain of interactions. The second level of the simulation chain takes care of the propagation of these particles and additional particles produced along their way through the atmosphere, Earth and seawater - depending on their travel path - until they reach the detector volume. In the final step, the light produced by the particles is simulated and propagated to the highly sensitive optical modules where they are digitised and passed to the above mentioned hardware +response simulation. + +The full event simulation is implemented in a run-by-run simulation strategy building on the so called data runs as standard data taking intervals of several hours. The detector response is simulated individually for these periods. Since large statistics are required for precise analyses, the simulation data will significantly exceed the real data in volume. + +### Event simulation derivatives as service + +As handling these large data sets is impractical for inter-experimental studies, but the information is crucial for the interpretability of the data, parameterized distributions of relevant observables need to be derived from the simulation data sets and offered as services. Even in absence of significant neutrino measurements in the construction phase of KM3NeT, offering sensitivity estimates for given models is beneficial for the development of common research goals and the development of a corresponding open service is currently under investigation. + diff --git a/portal/content/articles/software.md b/portal/content/articles/software.md new file mode 100644 index 0000000000000000000000000000000000000000..c000b646f90a86fd94e1a5e1aa81b278bc75b2db --- /dev/null +++ b/portal/content/articles/software.md @@ -0,0 +1,80 @@ ++++ +date = "2020-10-27T14:53:59Z" +title = "Software" +type = "article" +draft = true ++++ + +--- +* Python +* Software@Git <Git> +* Docker containers <Docker> +--- + +# Our Software + +## @Gitlab + +KM3NeT uses a self-hosted GitLab instance as the main platform to develop and +discuss software, analysis tools, papers and other private or collaborative +creations. GitLab offers professional and advanced features to keep track of +development history and its rich feature set allows to exchange and archive +thoughts and ideas easily. The continuous integration (CI) that is part of the +GitLab distribution, proves to be a powerful automation tool and is utilised to +generate consistently up-to-date test reports, documentation and software +releases in a transparent way. The CI pipeline is triggered every time when +changes are pushed to a project. Each job runs in an isolated Docker container +which makes them fully reproducible. + + + +In case of test reports for example, failing tests are signalised in merge +requests and prevent changes that broke them to be applied accidentally. The +documentation is also built in a dedicated pipeline job and is published to the +web upon successful generation. A tight integration of the documentation into +the software projects is mandatory and highly improves its up-to-dateness. + +The KM3NeT GitLab server is accessible to the public but only projects which are +marked as _global_ are visible to a regular visitor without a KM3NeT account. +They can download the projects and all the public branches, access the issues, +documentation and Wiki, they however are not allowed to collaborate, i.e. to +comment or contribute in any way. To circumvent this problem, open source +projects are mirrored to a yet inofficial GitHub group +(https://github.com/KM3NeT) where everyone with a GitHub account is allowed to +interact. + +## Docker + +Due to the huge variety of operating systems, languages and frameworks, the +number of possible system configurations has grown rapidly in the past decades. +Operating-system-level virtualisation is one of the most successful techniques +to tackle this problem and allows the conservation of environments, making them +interoperable and reproducible in an almost system agnostic way. KM3NeT utilises +Docker (https://www.docker.com) for this task, which is the most popular +containerisation solution with high interoperability. Docker containers run with +negligible performance overhead and create an isolated environment in a fully +reproducible manner, regardless of the host system as long as Docker itself is +supported (Linux, macOS and Windows). +These containers are used in the GitLab CI to run test suites in many different +configurations. Python based projects for example can easily be tested under +different Python versions. + +### List of accessible docker images + +ToDo: add list here! + +## Python environment + +KM3NeT develops open source Python software for accessing and working with data taken by the detector, produced in simulations or in other analysis pipelines e.g. event reconstructions, and a number of other types like metadata, provenance history and environmental data. The software is following the Semantic Versioning 2.0 (https://semver.org) conventions and releases are automatically triggered on the GitLab CI by annotated Git tags. These releases including alpha and beta releases are uploaded to the publicly accessible Python Package Index, which is the main repository of software for the Python programming language. The installation of these packages is as simple as executing `pip install PACKAGE_NAME`. Additionally, the packages can also be installed directly from the GitLab repositories for example in case of experimental branches. + +### Preferred python packages +The general philosophy behind all Python packages is to build a bridge to commonly use open source scientific tools, libraries and frameworks. While the common base is built on [NumPy](https://numpy.org), the de facto standard for scientific, numerical computing in Python, other popular packages from the [SciPy stack](https://www.scipy.org) are highly preferred. Examples are [Matplotlib](https://matplotlib.org) to create publication quality plots, [Pandas](https://pandas.pydata.org) which is used to work with tabular data, [Astropy](https://www.astropy.org) for astronomical calculations or [numba](http://numba.pydata.org) for high-performance low level optimisations. + +### Preferred formats for interoperability +The output format is preferably CSV and JSON to maximise interoperability. For larger or more complex datasets, two additional formats are supported. [HDF5](https://www.hdfgroup.org) which is a widely used data format in science and accessible in many popular computer languages is used to store data from every tier, including uncalibrated low-level data and high-level reconstruction summaries. Additionally and mainly for astronomical data, the [FITS](https://fits.gsfc.nasa.gov) dataformat is considered if required due to its high popularity among astronomers. + +### Python interface to KM3NeT data + +In addition to offering services and data through the KM3NeT Open Data Center, the [openkm3](https://open-data.pages.km3net.de/openkm3/) python client was developed to directly use open data in python from local computer and within e.g. Jupyter notebooks. It interlinks with the ODC REST-API and allows to query metadata of the resources and collections. In additionn to that, it offers function to interpret the data according to its KM3NeT type description ([ktype](Datamodels.md#ktype)), e.g. returning tables in a required format. These interface options will be expanded according to the requirements of data integrated to the ODC. + +In addition to that, basic functions relevant for astrophysics are offered in the [km3astro](https://km3py.pages.km3net.de/km3astro/) package. As development of the python environment is an ongoing process, the number of packages offered for KM3NeT data interpretation will surely grow in the future. diff --git a/portal/static/figures/Acoustic_data.png b/portal/static/figures/Acoustic_data.png new file mode 100644 index 0000000000000000000000000000000000000000..0f977e0d732a6995100a0b92e0e6064acfd19e62 Binary files /dev/null and b/portal/static/figures/Acoustic_data.png differ diff --git a/portal/static/figures/Data_levels.gif b/portal/static/figures/Data_levels.gif new file mode 100644 index 0000000000000000000000000000000000000000..8b06502692eb8104ca7415993302d396592022f3 Binary files /dev/null and b/portal/static/figures/Data_levels.gif differ diff --git a/portal/static/figures/ci-pipelines.png b/portal/static/figures/ci-pipelines.png new file mode 100644 index 0000000000000000000000000000000000000000..0c0d577c9b28b474445e2f84f177d6226799c38d Binary files /dev/null and b/portal/static/figures/ci-pipelines.png differ