-
Jutta Schnabel authoredJutta Schnabel authored
- Access and Archiving
- Open data sets and formats
- Particle event tables
- Data generation
- Data description
- Technical specification
- Data structure
- File format
- Interfaces
- Multimessenger alerts
- Data generation
- Data description
- Technical specification
- Data structure & format
- VO Event specifications
- Interfaces
- Supplementary services and data derivatives
- Data generation
- Data description
- Technical specification
- Data structure & format
- Interfaces
- Acoustic hydrophone data
- Data generation
- Data description
- Technical specification
- Data structure & format
- Interfaces
Title: Open data formats
Author: Jutta
status: review
Topics:
- Creation of the open data
- Data format description
- different example data: antares events, plots, orca events, acoustic
Access and Archiving
Open data sets and formats
As all of the following data is published, inter alia, via the Open Data Center, the data sets are all enriched with metadata following the KM3OpenResource description.
Particle event tables
Data generation
For particle event publication, the full information from data level 2 file reconstructed event is reduced to a "one row per event" format by selecting the relevant parameters from the level 2 files. The event and parameters selection, metadata annotation and conversion of parameters to the intended output format is performed using the km3pipe software. The prototype provenance recording has also been included in this software, so that the output of the pipeline includes already the relevant metadata as well as provenance information. The software allows writing of the data to several formats, including text-based formats and hdf5, which are the two relevant formats used in this demonstator.
Data description
Scientific use
Particle event samples can be used in both astrophyics analysis as well as neutrino oscillation studies, see the KM3NeT science targets. Therefore, the data must be made available in a format suitable for the Virtual Observatory as well as for particle physics studies.
Metadata
The events, from which relevant parameters like particle direction, time, energy and classification parameters are selected for generation of the event table, is enriched with the following metadata.
Metadata type | content |
---|---|
Provenance information | list of processing steps (referenced by identifier) |
Parameter description | parameter name, unit (SI), type, description, identifier |
Data taking metadata | start/stoptime, detector, event selection info |
Publication metadata | publisher, owner, creation date, version, description |
Technical specification
Data structure
The general data structure is an event list which can be displayed as a flat table with parameters for one event filling one row. Each event row contains an event identifier.
File format
For the tabled event data, various output formats are used depending on the platform used for publication and the requirements for interoperability. The formats defined at the moment here are not exclusive and might be extended according to specific requests from the research community in the future.
For hdf5 files as output, various options exist to store metadata, as several tables can be written to the same file and each table and the file itself can hold additional information as attributes to the file or table. Therefore, metadata that should be easy for the user to find and read have been stored to a separate "header" table while metadata that is more relevant for the machine-based interpretation of the data has been stored as attributes.
In the case of a text-based table, csv files are generated that are accompanied by a metadata file.
output format | provenance | parameters | data taking | publication |
---|---|---|---|---|
hdf5 | file header | table header | table header | "header" table |
csv table | metadata file | metadata file | metadata file | metadata file |
Interfaces
VO server If the neutrino set is relevant for astrophysics analyses, a text file is generated and the metadata mapped to the resource description format required by the DaCHs software, with the simple cone search (SCS) protocol applied to it. In the ODC, the event sample is recorded as KM3OpenResource pointing to the service endpoints of the VO server. Thus, the data set is findable both through the VO registry and the ODC and accessible through VO-offered access protocols.
KM3NeT Open Data Server In the current test setup, event files that are not easily interpretable in an astrophysics context like the test sample from the ORCA detector, containing mostly atmospheric muons, are stored on the server, and registered as KM3OpenResource. While this practice is acceptable now for the relatively small datasets, the design of the server also allows in the future to point to external data sources and interface with storage locations of extended data samples.
Multimessenger alerts
Data generation
Data generation and scientific use have been described in the Multimessenger section. The output of the online reconstruction chain is an array of parameters for the identified event as json key: value dictionary, which then is annotated with the relevant metadata to match the VOEvent specifications.
Data description
The event information can, depending on its specific use, be divided into the following data or metadata categories.
(Meta)data type | content |
---|---|
Event identification | event identifier, detector |
Event description | type of triggers, IsRealAlert |
Event coordinates | time, rightascension, declination, longitude, latitude |
Event properties | flavor, multiplicity, energy, neutrino type, error box 50%, 90% (TOC), reconstruction quality, probability to be neutrino, probability for astrophysical origin, ranking |
Publication metadata | publisher, contact |
Technical specification
Data structure & format
The VOEvent is stored as XML file which contains central sections of WhereWhen, Who, What, How and Why.
VO Event specifications
Section | Description | (Meta)data |
---|---|---|
<Who> |
Publication metadata | including VOEvent stream identifier |
<WhereWhen> |
Space-time coordinates | event coordinates offered in UTC (time) and FK5 (equatorial coordinates) and detector location |
<What> |
Additional parameters | event properties, event identifier |
<How> |
Additional information | description of the alert type |
<Why> |
Scientific context | details on the alert procedure |
Interfaces
The Alert receiving/sending is via the GCN. The Alert data will be the neutrino candidates in VOEvent format, which is the standard data format for experiments to report and communicate their observed transient celestial events facilitating for follow-ups. The alert distribution is done via Comet which is an implementation of the VOEvent transportation protocol.
Beyond this, there are also others receivers that can be implemented but are less convenient, e.g. the TNS for the optical alerts, the ZTF/LSST broker for the optical transients, the Fermi flare’s advocate for the Fermi blazar outbursts.
For the public alerts, KM3NeT will also submit the notice and circular (human in the loop) for the dissemination.
Supplementary services and data derivatives
Data generation
Providing context information on a broader scale in the form of e.g. sensitivity services and instrument response functions alongside the VO-published data sets is still under investigation and highly dependent on the specific information. Therefore, additional metadata for the interpretation of the format is required.
Data description
Scientific use
Models and theoretical background information used in the analysis are provided, e.g. accompagning data sets (as for the ANTARES example dataset), to statistically interpret the data sets. Alternatively, probability functions for theoretical predictions and drawn from simulations are considered for publication, including e.g. instrument response functions.
Metadata
Metadata here must be case specific:
- Description of the structure of the data (e.g. binned data, formula), which will be indicated by a content descriptor ktype and accompanied by type-specific additional metadata
- Description of the basic data set from which the information is derived, its scope in time and relevant restraints to the basic domain, e.g. description of the simulation sample
- Description of all relevant parameters
Technical specification
Data structure & format
The data is provided as csv table or json with the relevant metadata provided alongside the data in a separate text file or in a header section.
Interfaces
Interprestation of the plot or service data is provided using the openkm3 package, which loads the data as KM3OpenResource from the ODC and interprets it according to the ktype. The relevant data can the be accessed either as array or, where applicable, directly be rendered to a plot using matplotlib, which can then be edited further.
Acoustic hydrophone data
Data generation
Acoustic data aquisition as described in the the sea science section offers a continuous data stream of digitized acoustic data that will undergo a filtering process according to the scientific target of the audio data. At this point, the raw acoustic data before filtering can be offered as example data and to researchers interested in sea science. Snippets of acoustic data with a duration of a few minutes are produced at a fixed interval and directly offered, after format conversion, from a data server integrated in the acoustic data acquisitioning system and made accessible through a REST API. Integrating this data stream in the open science system therefore offers a good example on demonstrating the use of a data stream offered externally to the ODC and with a growing number of individually data sets.
Data description
Scientific use The hydrophone data can be used, after triggering and filtering, for acoustic neutrino detection, detector positioning calibration and identification of marine acoustic signals, e.g. originating from whales. In the unfiltered form, the acoustic data might primarily be of interest for sea science.
Metadata
- Publication metadata is added during record creation at the ODC
- Instrumentation & data taking settings are offered for each data package through a separate endpoint (/info) of the REST API.
Technical specification
Data structure & format
Each data package consists of the same audio data, recorded in custom binary format (raw), which is formatted to wave and mp3 audio files. Additionally, statistical properties of the audio snipped are offered in a separate stream.
format | endpoint | description | return format |
---|---|---|---|
raw | /raw | custom binary format | application/km3net-acoustic |
mp3 | /mp3 | mpeg encoded data | audio/mpeg |
wave | /wav | wave format data | application/octet-stream |
psd | /psd | array with mean, median, 75% and 95% quantile | application/json |
Interfaces
For each file, a KM3OpenResource is registered in the ODC. All resources belonging to the same data type are grouped using the KM3ResourceStream as metadata class, pointing to all resources of the data stream through the kid unique identifier. All streams belonging to the acoustic data service are grouped as KM3ResourceCollection. Thus, each single resource can be addressed as well as the logical connection between the resources preserved.
The data is directly accessible through the ODC webpage views or using openkm3 as client from a python interface.