diff --git a/figures/DataMC_reco.png b/figures/DataMC_reco.png new file mode 100644 index 0000000000000000000000000000000000000000..87267f48816115cc946024ab76401c26691db9cf Binary files /dev/null and b/figures/DataMC_reco.png differ diff --git a/figures/DataMC_trigger.png b/figures/DataMC_trigger.png new file mode 100644 index 0000000000000000000000000000000000000000..ffe5a70ead16e523de8548f7859dec0a835d6602 Binary files /dev/null and b/figures/DataMC_trigger.png differ diff --git a/figures/MeanRate_PMT.png b/figures/MeanRate_PMT.png new file mode 100644 index 0000000000000000000000000000000000000000..cc3390ce2f06fb6498816ca7e0d3273217f379e3 Binary files /dev/null and b/figures/MeanRate_PMT.png differ diff --git a/portal/content/articles/data.md b/portal/content/articles/data.md index 7b2729cb7aa8cebc9f5de8a456336f174913c339..bf881227fd7fd0abdb7f6d2a4f68eeaecb0d0c33 100644 --- a/portal/content/articles/data.md +++ b/portal/content/articles/data.md @@ -37,7 +37,7 @@ Although this already leads to a reduced data volume, these neutrino data sets a ## Open data sets and formats -As all of the following data is published, inter alia, via the Open Data Center, the data sets are all enriched with metadata following the [KM3OpenResource description](Datamodels.md#resource-description). +As all of the following data is published, inter alia, via the Open Data Center, the data sets are all enriched with metadata following the [KM3OpenResource description]({{< ref "Experts.md#resource-description" >}}). ### Particle event tables @@ -48,7 +48,7 @@ For particle event publication, the full information from data level 2 file reco **Scientific use** -Particle event samples can be used in both astrophyics analysis as well as neutrino oscillation studies, see the [KM3NeT science targets](ScienceTargets.md). Therefore, the data must be made available in a format suitable for the Virtual Observatory as well as for particle physics studies. +Particle event samples can be used in both astrophyics analysis as well as neutrino oscillation studies, see the [KM3NeT science targets]({{< ref "Science.md#scientific-targets Experts.md#resource-description" >}}). Therefore, the data must be made available in a format suitable for the Virtual Observatory as well as for particle physics studies. **Metadata** @@ -64,7 +64,7 @@ The events, from which relevant *parameters* like particle direction, time, ener #### Technical specification ##### Data structure -The general data structure is an event list which can be displayed as a flat table with parameters for one event filling one row. Each event row contains an [event identifier](Datamodels.md#particle-event-identifiers). +The general data structure is an event list which can be displayed as a flat table with parameters for one event filling one row. Each event row contains an [event identifier]({{< ref "Experts.md#identifiers-and-content-description" >}}). ##### File format For the tabled event data, various output formats are used depending on the platform used for publication and the requirements for interoperability. The formats defined at the moment here are not exclusive and might be extended according to specific requests from the research community in the future. @@ -89,7 +89,7 @@ In the current test setup, event files that are not easily interpretable in an a ### Multimessenger alerts #### Data generation -Data generation and scientific use have been described in [the Multimessenger section](Multimessenger.md). The output of the online reconstruction chain is an array of parameters for the identified event as json *key: value* dictionary, which then is annotated with the relevant metadata to match the [VOEvent specifications](https://ivoa.net/documents/VOEvent/20110711/index.html). +Data generation and scientific use have been described in [the Multimessenger section]({{< ref "Science.md#multi-messenger-astrophysics" >}}). The output of the online reconstruction chain is an array of parameters for the identified event as json *key: value* dictionary, which then is annotated with the relevant metadata to match the [VOEvent specifications](https://ivoa.net/documents/VOEvent/20110711/index.html). #### Data description @@ -140,7 +140,7 @@ Models and theoretical background information used in the analysis are provided, **Metadata** Metadata here must be case specific: -* Description of the *structure of the data* (e.g. binned data, formula), which will be indicated by a content descriptor [ktype](Datamodels.md#ktype) and accompanied by type-specific additional metadata +* Description of the *structure of the data* (e.g. binned data, formula), which will be indicated by a content descriptor [ktype]({{< ref "Experts.md#identifiers-and-content-description" >}}) and accompanied by type-specific additional metadata * Description of the *basic data set* from which the information is derived, its scope in time and relevant restraints to the basic domain, e.g. description of the simulation sample * Description of all relevant *parameters* @@ -154,7 +154,7 @@ Interprestation of the plot or service data is provided using the *openkm3* pack ### Acoustic hydrophone data #### Data generation -Acoustic data aquisition as described in the [the sea science section](SeaScience.md#acoustic-data) offers a continuous data stream of digitized acoustic data that will undergo a filtering process according to the scientific target of the audio data. At this point, the raw acoustic data before filtering can be offered as example data and to researchers interested in sea science. Snippets of acoustic data with a duration of a few minutes are produced at a fixed interval and directly offered, after format conversion, from a data server integrated in the acoustic data acquisitioning system and made accessible through a REST API. Integrating this data stream in the open science system therefore offers a good example on demonstrating the use of a data stream offered externally to the ODC and with a growing number of individually data sets. +Acoustic data aquisition as described in the [the sea science section]({{< ref "Science.md#sea-science" >}}) offers a continuous data stream of digitized acoustic data that will undergo a filtering process according to the scientific target of the audio data. At this point, the raw acoustic data before filtering can be offered as example data and to researchers interested in sea science. Snippets of acoustic data with a duration of a few minutes are produced at a fixed interval and directly offered, after format conversion, from a data server integrated in the acoustic data acquisitioning system and made accessible through a REST API. Integrating this data stream in the open science system therefore offers a good example on demonstrating the use of a data stream offered externally to the ODC and with a growing number of individually data sets. #### Data description @@ -178,7 +178,7 @@ Each data package consists of the same audio data, recorded in custom binary for | psd | /psd | array with mean, median, 75% and 95% quantile | application/json | ##### Interfaces -For each file, a KM3OpenResource is registered in the ODC. All resources belonging to the same data type are grouped using the [KM3ResourceStream](Datamodels.md#datamodels) as metadata class, pointing to all resources of the data stream through the kid unique identifier. All streams belonging to the acoustic data service are grouped as KM3ResourceCollection. Thus, +For each file, a KM3OpenResource is registered in the ODC. All resources belonging to the same data type are grouped using the [KM3ResourceStream]({{< ref "Experts.md#datamodels" >}}) as metadata class, pointing to all resources of the data stream through the kid unique identifier. All streams belonging to the acoustic data service are grouped as KM3ResourceCollection. Thus, each single resource can be addressed as well as the logical connection between the resources preserved. The data is directly accessible through the ODC webpage views or using openkm3 as client from a python interface. diff --git a/portal/content/articles/experts.md b/portal/content/articles/experts.md index eaf1676285d95abf352caff87315e54c7f1ac8e9..227072043224814086bf6c13fa3f21c0c857d002 100644 --- a/portal/content/articles/experts.md +++ b/portal/content/articles/experts.md @@ -23,9 +23,9 @@ In all these fields, the standards of KM3NeT are currently developing. In this d The FAIR principles provide a solid set of requirements for the development of an open data regime. Following the FAIR requirements, the following solutions have been established in KM3NeT to enable FAIR data sharing and open science. #### Findable data -* **[Unique identifiers](pages/Datamodels.md#identifiers-and-content-description)** have been defined for digital objects within KM3NeT, including data files, software, collections and workflow steps, as well as identifiers for relevant data sets like particle detection ("event") in the detector. -* At the publication level, extended **[metadata sets](pages/Datamodels.md#resource-description)** are assigned to each published data product. -* The datasets can both be accessed via UID directly on the data servers as well as through external community-relevant **[repositories](pages/Repositories.md)**. +* **[Unique identifiers]({{< ref "#identifiers-and-content-description" >}})** have been defined for digital objects within KM3NeT, including data files, software, collections and workflow steps, as well as identifiers for relevant data sets like particle detection ("event") in the detector. +* At the publication level, extended **[metadata sets]({{< ref "#resource-description" >}})** are assigned to each published data product. +* The datasets can both be accessed via UID directly on the data servers as well as through external community-relevant **[repositories]({{< ref "#registries-and-archiving" >}})**. #### Accessible data * The data can, at this point, be directly accessed via a webpage and through a REST-API where data cannot be offered through VO protocols. @@ -34,7 +34,7 @@ The FAIR principles provide a solid set of requirements for the development of a #### Interoperable data * Vocabularies and content descriptors are introduced that draw on external standards like VO standards or W3C standards where possible. -* **[Documentation on the metadata](pages/Datamodels.md)** and vocabularies are provided. +* **[Documentation on the metadata]({{< ref "#datamodels" >}})** and vocabularies are provided. * Metadata classes are well-connected to allow the cross-referencing between different digital objects and extended metadata. #### Reusable data @@ -42,7 +42,6 @@ The FAIR principles provide a solid set of requirements for the development of a * Basic **provenance** information is provided with the data, which serves as a development start point to propagate provenance management through the complex data processing workflow in the future. - ## Datamodels Metadata definition lies at the core of FAIR data, as it governs both the understanding of the data and as well as the interoperablity through access protocols. While some software can be used almost as-is, especially regarding the well-developed interfaces in the Virtual Observatory, the different data types and science fields that KM3NeT can link into requires a flexible approach and diverse application of software. In order to meet these various requirements, metadata and class definitions are developed within KM3NeT, drawing on well established standards e.g. of the [W3 Consortium](https://www.w3.org/), scientific repositories or the IVOA standards. @@ -106,7 +105,7 @@ Entries to the VO registry are annotated using the [Resource Metadata standards] **Format conversion** -In order to transform event-based data to a VO-compatible format, [standard scripts](https://git.km3net.de/open-data/voserver) have been set up to convert [neutrino event tables](Dataformats.md#particle-event-tables) and the according metadata into VO-compatible format and add the required metadata to the data set. To publish a dataset, the procedure includes +In order to transform event-based data to a VO-compatible format, [standard scripts](https://git.km3net.de/open-data/voserver) have been set up to convert [neutrino event tables]({{< ref "data.md#particle-event-tables" >}}) and the according metadata into VO-compatible format and add the required metadata to the data set. To publish a dataset, the procedure includes * labeling the data set according to VO standards with information about origin, authorship, parameter formats and standardized event identifiers. In the DaCHS software, this is handled through a resource description file, to which information from the KM3OpenResource description is cast. @@ -114,7 +113,7 @@ authorship, parameter formats and standardized event identifiers. In the DaCHS s **Publication procedure** -Publication involves uploading and listing the data set as available in the registry of the server, which is handled through simple [administration commands]() in DaCHS. +Publication involves uploading and listing the data set as available in the registry of the server, which is handled through simple administration commands in DaCHS. #### Interlink to registries @@ -122,7 +121,7 @@ In order to declare the entries of the local server to the [IVOA Registry of Reg #### Example data -As KM3NeT does not produce astrophysics data yet, an alternative data sample from the [ANTARES experiment](https://antares.in2p3.fr/) was used to set up the system. The ANTARES collaboration has already published two data sets to the VO by using the services of the German Astrophysical Virtual Observatory (GAVO) which runs the DaCHS software and is maintainer of it. The most recent public data sample was only made available through the ANTARES website and thus did not match the FAIR criteria. Using the [ANTARES 2007-2017 point source neutrino sample](Usecase_ANTARES.md), the KM3NeT VO server could be registered to the VO and hosts this data set as example data. +As KM3NeT does not produce astrophysics data yet, an alternative data sample from the [ANTARES experiment](https://antares.in2p3.fr/) was used to set up the system. The ANTARES collaboration has already published two data sets to the VO by using the services of the German Astrophysical Virtual Observatory (GAVO) which runs the DaCHS software and is maintainer of it. The most recent public data sample was only made available through the ANTARES website and thus did not match the FAIR criteria. Using the [ANTARES 2007-2017 point source neutrino sample]({{< ref "getting-started.md#antares-2007-2017-point-source-analysis" >}}), the KM3NeT VO server could be registered to the VO and hosts this data set as example data. #### Data access Tabulated high-level neutrino event data can be accessed utilizing access protocols like the Table Access Protocol (TAP) and query languages like the Astronomical Data Query Language (ADQL). To query these data sets related to astronomical sources, the Simple Cone Search (SCS) protocol allows to pick specific events according to particle source direction. @@ -211,33 +210,38 @@ Both PyPI and Zenodo copy the relevant software to their platforms, copies of th ## Data Quality assessment -The processes involved in the KM3NeT data processing chain can be grouped into a few main categories. Although the ordering of these categories is not strictly hierarchycal from the point of view of data generation and processing, in the context of a general discussion one could safely assume that a hierarchycal relation exists between them. From bottom to top, these categories would be data acquisition, detector calibration, event reconstrucion, simulations and finally scientific analyses based on the data processed in the previous categories. The quality of the scientific results produced by KM3NeT will be affected by the performance of the different processes involved in the lower levels of the data processing chain. In order to implement a complete and consistent set of data quality control procedures that span the whole data processing chain, it is required to have a complete, unambiguous and documented strategy for data processing at each of the aforementioned process categories. This includes the setting of data quality criteria which should be initiated at the highest level of the data processing chain, and propagated towards the lowest levels. For each of the aforementioned categories there exists a working group within the KM3NeT collaboration. It therefore corresponds to each of these working groups to develop the working group objectives and procedures according to the scientific needs of KM3NeT. Currently such a documented strategy does not exist for any working group. It has therefore not been possible to develop a full strategy for data quality control. Nevertheless, there have been copious software developments devoted to quality control along the different stages of the data processing chain. In the following, a description of some of the existing quality control tools and procedures is given. This description could be conceived as an incomplete prototype for a data quality plan. The implementation of these procedures into an automated workflow requires the design and implementation of a standardised data processing workflow which meets software quality standards. This does not exist either. Some of the figures and results shown here have been produced ad-hoc, and not as a result of any working system. +The processes involved in the KM3NeT data processing chain can be grouped into a few main categories. Although the ordering of these categories is not strictly hierarchical from the point of view of data generation and processing, in the context of a general discussion one could safely assume that a hierarchical relation exists between them. From bottom to top, these categories would be data acquisition, detector calibration, event reconstruction, simulations and finally scientific analyses based on the data processed in the previous categories. The quality of the scientific results produced by KM3NeT will be affected by the performance of the different processes involved in the lower levels of the data processing chain. In order to implement a complete and consistent set of data quality control procedures that span the whole data processing chain, it is required to have a complete, unambiguous and documented strategy for data processing at each of the aforementioned process categories. This includes the setting of data quality criteria which should be initiated at the highest level of the data processing chain, and propagated towards the lowest levels. For each of the aforementioned categories there exists a working group within the KM3NeT collaboration, in which a quality strategy is required and in the process of being established. It is therefore not possible to provide a full setup for data quality control at this point. Nevertheless, there have been copious software developments devoted to quality control along the different stages of the data processing chain. In the following, a description of some of the existing quality control tools and procedures is given. This description could be conceived as an incomplete prototype for a data quality plan. The implementation of these procedures into an automated workflow requires the design and implementation of a standardised data processing workflow which meets software quality standards. This does not exist either. Some of the figures and results shown here have been produced ad-hoc, and not as a result of any working system. ### Data quality control procedures #### Online Monitor - -During the data acquisition process, the online monitoring software presents real time plots that allow the shifters to promptly identify problems with the data acquisition. It includes an alert system that sends notifications to the shifters if during the data taking, problems appear that require human intervention. The online monitor uses the same data that are stored for offline analyses (this is actually not true, and should be changed). This implies that any anomaly observed during the detector operation can be reproduced offline. {HERE, A FIGURE COULD BE GIVEN AS AN EXAMPLE. FOR INSTANCE, THE MESSAGES ON THE CHAT WHEN THE TRIGGER RATE IS 0.} +During the data acquisition process, the online monitoring software presents real time plots that allow the shifters to promptly identify problems with the data acquisition. It includes an alert system that sends notifications to the shifters if during the data taking, problems appear that require human intervention. The online monitor uses the same data that are stored for offline analyses (this is actually not true, and should be changed). This implies that any anomaly observed during the detector operation can be reproduced offline. #### Detector Operation +As explained in the detector section, the optical data obtained from the detector operation are stored in ROOT files and moved to a high performance storage environment. The offline data quality control procedures start with a first analysis of these files which is performed daily. It mainly focuses on but is not restricted to the summary data stored in the ROOT files. The summary data contain information related to the performance of the data acquisition procedures for each optical module in the detector. As a result of this first analysis, a set of key-value pairs is produced where each key corresponds to a parameter that represents a given dimension of data quality and the value represents the evaluation of this parameter for the live-time of the analysed data. The results are tagged with a unique identifier corresponding to the analysed data set and uploaded to the database. In the present implementation the analysis is performed for each available file where each file corresponds to a data taking run, although this may change in the future as the data volume generated per run will increase with the detector size. + + -As explained in XX.YY, the optical data obtained from the detector operation are stored in ROOT files and moved to a high performance storage environment. The offline data quality control procedures start with a first analysis of these files which is performed daily. It mainly focuses on but is not restricted to the summary data stored in the ROOT files. The summary data contain information related to the performance of the data acquisition procedures for each optical module in the detector. As a result of this first analysis, a set of key-value pairs is produced where each key corresponds to a parameter that represents a given dimension of data quality and the value represents the evaluation of this parameter for the livetime of the analysed data. The results are tagged with a unique identifier corresponding to the analysed data set and uploaded to the database. In the present implementation the analysis is performed for each available file where each file corresponds to a data taking run, although this may change in the future as the data volume generated per run will increase with the detector size. -A further analysis of the results stored in the database includes the comparison of the values of the different parameters to some reference values, allowing for a classification of data periods according to their quality. The reference values are typically set according to the accuracy with which the current detector simulations include the different quality parameters. In addition, the evolution of the different quality parameters can be monitored and made available to the full collaboration as reports. Currently this is done every week by the shifters, and the reports are posted on an electronic log book (ELOG). Figure {FIG}, shows an example of the time evolution for two quality parameters during the period corresponding to the data sample that are provided together with this report. The selected runs correspond to a period of stable rates and during which the different quality parameters were within the allowed tolerance. +A further analysis of the results stored in the database includes the comparison of the values of the different parameters to some reference values, allowing for a classification of data periods according to their quality. The reference values are typically set according to the accuracy with which the current detector simulations include the different quality parameters. In addition, the evolution of the different quality parameters can be monitored and made available to the full collaboration as reports. Currently this is done every week by the shifters, and the reports are posted on an electronic log book (ELOG). The figure above shows an example of the time evolution for a quality parameter during the period corresponding to the data sample that are provided together with this report. The selected runs correspond to a period of stable rates and during which the different quality parameters were within the allowed tolerance. -#### Calibration +### Calibration -The first step in the data processing chain is to determine the detector calibration parameters using the data obtained from the detector operation. These parametrers include the time offsets of the PMTs as well as their gains and efficiencies, the positions and orientations of the optical modules. The PMT time offsets and the positions of the optical modules are used in later stages of the data processing chain for event reconstruction, as well as by the real time data filter during the detector operation. While the event reconstruction requires an accurate knowledge of these parameters, the algorithms used by the real time data filter depend rather losely on them, and its performance is not dependent on variations occuring within a timescale of the order of months. Nevertheless, it is still necessary to monitor them and correct the values used by the data filter if necessary. The performance of the detector operation also depends on the response of the PMTs, which is partly determined by their gains. These evolve over time, and they can be set to their nominal values through a tuning of the high-voltage applied to each PMT. Monitoring the PMT gains is therefore also necessary to maximise the detector performance. Additionally, the PMT gains and efficiencies are also used offline by the detector simulation. Within the context of data quality assesment, software tools have been developed by KM3NeT that allow to monitor the parameters described above and to compare them to reference values, raising alerts when necessary. The reference values should be determined by the impact of miscalibrations on the scientific goals of KM3NeT and this work has not been adressed. The arrangement of these tools into a workflow requires the ellaboration of an underlying calibration strategy. This has not been done, and the work is therefore on hold. +The first step in the data processing chain is to determine the detector calibration parameters using the data obtained from the detector operation. These parameters include the time offsets of the PMTs as well as their gains and efficiencies, the positions and orientations of the optical modules. The PMT time offsets and the positions of the optical modules are used in later stages of the data processing chain for event reconstruction, as well as by the real time data filter during the detector operation. While the event reconstruction requires an accurate knowledge of these parameters, the algorithms used by the real time data filter depend rather loosely on them, and its performance is not dependent on variations occurring within a timescale of the order of months. Nevertheless, it is still necessary to monitor them and correct the values used by the data filter if necessary. -#### Simulations and Event reconstruction +The performance of the detector operation also depends on the response of the PMTs, which is partly determined by their gains. These evolve over time, and they can be set to their nominal values through a tuning of the high-voltage applied to each PMT. Monitoring the PMT gains is therefore also necessary to maximise the detector performance. Additionally, the PMT gains and efficiencies are also used offline by the detector simulation. Within the context of data quality assessment, software tools have been developed by KM3NeT that allow to monitor the parameters described above and to compare them to reference values, raising alerts when necessary. The reference values should be determined by the impact of mis-calibrations on the scientific results of KM3NeT, a task which at this point has been started to be addressed. The arrangement of these tools into a workflow requires the elaboration of an underlying calibration strategy. This has not been done, and the work is therefore on hold. -Once the calibration constnats have been determined, the data processing chain continues with the event reconstruction, and with the simulation and reconstruction of an equivalent set of events where the information in the summary data is used to simulate the data taking conditions. The simulation of particle interactions and propagation is done by dedicated software, while the detector simulation and event reconstruction is done by Jpp. As a result of the simulation chain, a ROOT file is obtained which has the same format as the ROOT file produced by the data acquisition system. This contains events obtained after the simulation of the detector trigger. The resulting file and the corresponding data taking file, are identically processed by the reconstruction software, which produces ROOT formatted files with the result of reconstructing the real data events and the simulated events respectively. The comparison between data and simulations is an important parameter to measure the quality of the data and it can be done at trigger level, and at reconstruction level. In both cases, the comparison follows the same strategy: the root files are used to produce histograms of different observables and these histograms are saved into new ROOT files. A set of libraries and applications devoted to histogram comparisons have been developped in Jpp. These implement multiple statistical tests that can be used to determine if two histograms are compatible, as well as the degree of incompatibility between them. Additionally, tools have been developed that allow to summarise the reuslts into a number per file, which represents the average result after comparing all the observables. For the example provided here, the discrepancy between data and montecarlo is measured through the calculation of the reduced chi2 for each observable, and the summary is given as the average reduced chi2 of all the compared observables for each file. Figures {X} and {Y} show the value of this parameter as a function of the run number. +### Simulations and event reconstruction +Once the calibration constants have been determined, the data processing chain continues with the event reconstruction, and with the simulation and reconstruction of an equivalent set of events where the information in the summary data is used to simulate the data taking conditions. The simulation of particle interactions and propagation is done by dedicated software, while the detector simulation and event reconstruction is done by Jpp. As a result of the simulation chain, a ROOT file is obtained which has the same format as the ROOT file produced by the data acquisition system. This contains events obtained after the simulation of the detector trigger. The resulting file and the corresponding data taking file, are identically processed by the reconstruction software, which produces ROOT formatted files with the result of reconstructing the real data events and the simulated events respectively. The comparison between data and simulations is an important parameter to measure the quality of the data and it can be done at trigger level, and at reconstruction level. -These tools would also allow for the calculation of other data quality metrics by comparing the data with figures of merit for different observables. For this, the development of plans and strategies mentioned in the introduction of this section is necessary. +In both cases, the comparison follows the same strategy: the root files are used to produce histograms of different observables and these histograms are saved into new ROOT files. A set of libraries and applications devoted to histogram comparisons have been developed in Jpp. These implement multiple statistical tests that can be used to determine if two histograms are compatible, as well as the degree of incompatibility between them. Additionally, tools have been developed that allow to summarise the results into a number per file, which represents the average result after comparing all the observables. For the example provided here, the discrepancy between data and Monte Carlo is measured through the calculation of the reduced χ2 for each observable, and the summary is given as the average reduced χ2 of all the compared observables for each file. The following figures show the value of this parameter for the data and simulations comparisons at trigger and reconstruction levels. -The contents in the files produced by the reconstruction routines are the main ingredients for the physics analyses, though these analyses are typically done with dedicated software frameworks which require a special formatting of the data. The data format used for physics analyses is called 'aanet' format, which is also based in ROOT. Control mechanisms are thus needed to ensure consistency between the files produced by Jpp, and the aanet formatted files. Consistency between Jpp and aanet files can be verified up to a certain level by producing distributions of the different parameters and by verifying that these distributions are identical. + + +These tools would also allow for the calculation of other data quality metrics by comparing the data with figures of merit for different observables. For this, the development of plans and strategies mentioned in the introduction of this section is necessary. +The contents in the files produced by the reconstruction routines are the main ingredients for the physics analyses, though these analyses are typically done with dedicated software frameworks which require a special formatting of the data. The data format used for physics analyses is called ‘aanet’ format, which is also based in ROOT. Control mechanisms are thus needed to ensure consistency between the files produced by Jpp, and the aanet formatted files. Consistency between Jpp and aanet files can be verified up to a certain level by producing distributions of the different parameters and by verifying that these distributions are identical. ## Copyright and Licensing diff --git a/portal/content/articles/getting-started.md b/portal/content/articles/getting-started.md index 361f52e6a7d9b9fbcaf05b66914f77cbec7b90f4..81031ee4055ba459c1ce2ed7cd88b6bbda3ae778 100644 --- a/portal/content/articles/getting-started.md +++ b/portal/content/articles/getting-started.md @@ -21,21 +21,21 @@ The products offered from the various servers include not only data, but also so #### Data sets Only [high-level data sets]({{< ref "Data.md#high-level-data-and-data-derivatives" >}}) and derivatives are offered in the open science platforms. These include: -* Particle events from **[optical neutrino detection](Detector.md#data-acquisition)**, which is the prinary data production channel for KM3NeT. -* **[Multi-messenger alert data](Multimessenger.md)** which is broadcast for events with high scientific relevance in multi-messenger searches. -* **[Environmental data](SeaScience.md)** from both calibration-relevant data for the deep-sea detector and acoustic data for UHE neutrino detection which is relevant to Sea Science. +* Particle events from **[optical neutrino detection]({{< ref "Science.md#data-acquisition" >}})**, which is the prinary data production channel for KM3NeT. +* **[Multi-messenger alert data]({{< ref "Science.md#multi-messenger-astrophysics" >}})** which is broadcast for events with high scientific relevance in multi-messenger searches. +* **[Environmental data]({{< ref "Science.md#sea-science" >}})** from both calibration-relevant data for the deep-sea detector and acoustic data for UHE neutrino detection which is relevant to Sea Science. #### High-level derivatives For the evaluation of the significance of the data for a given analysis target, additional information crucial to the interpretation of the data is published. These include: -* Binned and paramterized information drawn from **[simulations](Simulation.md)**, e.g. instrument response functions or sensitivities. +* Binned and paramterized information drawn from **[simulations]({{< ref "Science.md#detector-and-event-simulations" >}})**, e.g. instrument response functions or sensitivities. * High-level data **summary information** from dedicated analyses, e.g. public plots. #### Software and workflows Software and workflow examples facilitate the processing of the open data and form the basis for community-oriented developments. Offered products are: * **KM3NeT-related software** for data processing, simulation and analysis. -* **[Interface packages](Python.md#python-interface-to-km3net-data)** for data access, currently based on python. +* **[Interface packages]({{< ref "Software.md#python-interface-to-km3net-data" >}})** for data access, currently based on python. * **Workflow descriptions** as Jupyter notebooks or annotated workflows for data handling. #### Tutorials and Documentation @@ -49,17 +49,17 @@ Documentation is set up to provide a linear step-by-step approach to the use of The products are published through various platforms tailored specifically to the requirements for sharing of the specific product. #### Data servers -* Astrophysics-related data is offered through the **[KM3NeT Virtual Observatory server](VOserver.md)** running the VO-integrated [DaCHS software](https://dachs-doc.readthedocs.io/index.html). -* All services and data not integrateable into the VO due to their different scientific domain or technical nature are offered through the [KM3NeT Open Data Center](KM3NeTserver.md) offering webpages and a REST-API to query the data products. +* Astrophysics-related data is offered through the **[KM3NeT Virtual Observatory server]({{< ref "Experts.d#the-virtual-observatory-server" >}})** running the VO-integrated [DaCHS software](https://dachs-doc.readthedocs.io/index.html). +* All services and data not integrateable into the VO due to their different scientific domain or technical nature are offered through the [KM3NeT Open Data Center]({{< ref "Experts.md#the-km3net-open-data-center" >}}) offering webpages and a REST-API to query the data products. #### Software servers -* All software is offered through an KM3NeT-operated **[Gitlab server](Git.md)** which hosts all software and documentation projects. -* Containers with key software are packaged and made available through a **[Docker server](Docker.md)**. +* All software is offered through an KM3NeT-operated **[Gitlab server]({{< ref "Software.md#gitlab" >}})** which hosts all software and documentation projects. +* Containers with key software are packaged and made available through a **[Docker server]({{< ref "Software.md#docker" >}})**. #### Documentation & Courses Documentation is widely spread through the system and generally offered as closely as possible to the actual product. Dedicated platforms offering centralized knowledge access are -* the **[Education portal](Courses.md)** for tutorials and webinars, and +* the **[Education portal]({{< ref "#online-courses" >}})** for tutorials and webinars, and * Gitlab projects and pages, especially the **[Open Science Portal](https://open-data.pages.km3net.de/openscienceportal/)** which documents the KM3NeT open science system. ### Interfaces @@ -67,7 +67,7 @@ Documentation is widely spread through the system and generally offered as close For the end user, various access options to data, software and documentation are offered. #### Repositories -Data is registered to large [centralized data registries](Repositories.md) which make the KM3NeT data findable by the wider community. These registries include +Data is registered to large [centralized data registries]({{< ref "Experts.md#registries-and-archiving" >}}) which make the KM3NeT data findable by the wider community. These registries include * the **[VO registry](https://doi.org/10.1016/j.ascom.2014.07.001)** to which all data sets and services registered through the VO server are pushed, and * the [Zenodo repository](https://zenodo.org/) or alternative solution to integrate with **[DataCite](https://datacite.org/)**. diff --git a/portal/content/articles/science.md b/portal/content/articles/science.md index f55bd24615efbcb5c5e3f9048d48910a37639a47..09389e07d1a959fcf21c249c464831942a9a0d51 100644 --- a/portal/content/articles/science.md +++ b/portal/content/articles/science.md @@ -7,7 +7,7 @@ draft = false ## Scientific Targets -The KM3NeT neutrino detectors will continuously register neutrinos from the whole sky. The neutrinos of astrophysical interest, i.e. those from extra-terrestrial origin, need to be identified in the background of atmospheric neutrinos, i.e. those created in Earth’s atmosphere by interactions of cosmic-ray particles. Access to cosmic neutrino data is of high importance for a wide astrophysics community to relate cosmic neutrino fluxes to observations by other neutrino observatories or using [other messengers](Multimessengers.md), and to compare them with theoretical predictions. The atmospheric neutrinos carry information on the particle physics processes in which they are created and on the neutrinos themselves. These data are relevant for a wide astroparticle and particle physics community. Finally, KM3NeT will monitor marine parameters, such as bioluminescence, currents, water properties and transient acoustic signals and provides user ports for Earth and Sea sciences. +The KM3NeT neutrino detectors will continuously register neutrinos from the whole sky. The neutrinos of astrophysical interest, i.e. those from extra-terrestrial origin, need to be identified in the background of atmospheric neutrinos, i.e. those created in Earth’s atmosphere by interactions of cosmic-ray particles. Access to cosmic neutrino data is of high importance for a wide astrophysics community to relate cosmic neutrino fluxes to observations by other neutrino observatories or using [other messengers]({{< ref "#multi-messenger-astrophysics" >}}), and to compare them with theoretical predictions. The atmospheric neutrinos carry information on the particle physics processes in which they are created and on the neutrinos themselves. These data are relevant for a wide astroparticle and particle physics community. Finally, KM3NeT will monitor marine parameters, such as bioluminescence, currents, water properties and transient acoustic signals and provides user ports for Earth and Sea sciences. ### Astro Physics @@ -20,7 +20,7 @@ The preferred search strategy is to identify upward-moving tracks, which unambig Besides all-favour neutrino astronomy, i.e. investigating high-energy cosmic neutrinos and identifying their astrophysical sources, additional physics topics of ARCA include -- [multi-messenger studies](Multimessengers.md) +- [multi-messenger studies]({{< ref "#multi-messenger-astrophysics" >}}) - particle physics with atmospheric muons and neutrinos - indirect searches for dark matter @@ -71,7 +71,7 @@ Because neutrinos travel with nearly the speed of the light, a real-time or near #### KM3NeT Multi-Messenger Neutrino Alerts -With the Southern sky including the Galactic Center in view and a good angular resolution, KM3NeT will contribute greatly to the multi-messenger community. For a detailed description of the detector, see [the detector page](Detector.md). +With the Southern sky including the Galactic Center in view and a good angular resolution, KM3NeT will contribute greatly to the multi-messenger community. For a detailed description of the detector, see [the detector page]( {{< ref "science.md#detector" >}}). * For the search of neutrinos via event reconstructions based on (causally coincident) lit-up DOMs, focusing on high energy neutrinos, such as astrophysical neutrinos, KM3NeT will be able to send/receive alerts from/to the multi-messenger community: @@ -92,7 +92,7 @@ The alert types include: * Any neutrinos correlated with external alerts. * Other alerts to be defined, or more subcategories divided from the High-Energy Neutrino alerts if necessary (e.g. track HE, cascade HE alerts). -For alert data formatting and sending, see [the dataformat definition](Dataformats.md#multimessenger-alerts). +For alert data formatting and sending, see [the dataformat definition]({{< ref "data.md#multimessenger-alerts" >}}). ### Sea Science @@ -108,7 +108,7 @@ The telescope is also equipped with acoustic sensors: 1 piezo-electric ceramic s The default KM3NeT DAQ has been designed to identify only the (known) acoustic signals emitted by a long baseline of acoustic sensors among the full data stream (Figure 2). A further implementation of the DAQ isforeseen in the future, permitting on-line analysis, display and recording of the underwater noise spectrum through a dedicated channel. A large number of use cases has been identified, as described [in this document](https://www.km3net.org/wp-content/uploads/2020/02/D8.1-KM3NeT-rules-of-access-for-external-users.pdf). -![Acoustic data acquisition system, from [2]](/figures/Acoustic_data.png) + At this stage, the unfiltered acoustic data can be made available directly from the ADF, which is provided in several formats through a REST-API on a data server integrated in the acoustic data processing system of KM3NeT. @@ -124,8 +124,6 @@ A collection of 115 strings forms a single KM3NeT building block. The modular de The ARCA (Astroparticle Research with Cosmics in the Abyss) detector is being installed at the KM3NeT-It site, 80km offshore the Sicilian coast offshore to Capo Passero (Italy) at a sea bottom depth of about 3450m. About 1 km^3 of seawater will be instrumented with ∼130000 PMTs. The ORCA (Oscillation Research with Cosmics in the Abyss) detector is being installed at the KM3NeT-Fr site, 40km offshore Toulon (France) at a sea bottom depth of about 2450m. A volume of about 8 Mton is instrumented with ∼65000 PMTs. -Technical details on the detector design are given in [1]. - ### Data Acquisition The readout of the KM3NeT detector is based on the 'all-data-to-shore' concept, in which all analogue signals from the PMTs that pass a reference threshold are digitised. This data contain the time at which the analogue pulse crosses the threshold level, the time that the pulse remains above the threshold level (known as time-over-threshold, or ToT), and the PMT address. This is typically called a *hit*. All digital data (about 25 Gb/s per building block) are sent to a computing farm onshore where they are processed in real time.