Skip to content
Snippets Groups Projects
Title: Workflow management
Author: Jutta, Tamas
status: review please

Workflow management

Workflow description

The simulation and raw data processing chains consist of workflows that are executed on heterogeneous clusters of computers. The harmonisation of workflow interfaces is one of the bigger challenges due to the large variety of software being used in individual processing steps. The benefits of a standardised interface lie in clarity and simplified maintenance. The Common Workflow Language (CWL) is currently being evaluated as an abstraction layer to describe analysis workflows and tools. It provides scalability and portability across many different software and hardware environments.

Computing resource management

The second layer of workflow management targets the distributed processing of data over heterogeneous computing infrastructures. For this purpose, the DIRAC interware has been chosen which provides a common interface to a multitude of resource provides like grids, cloud systems and clusters of computers, in a seamless manner, offering great interoperability. DIRAC will be used to set up a data driven processing infrastructure for all stages in the pipelines including the mangling of raw data and Monte Carlo simulations.

FAIR workflows

In the combination of CWL and DIRAC software, both the detailed description and modularization of the individual workflow steps and the data driven optimization of installation and distribution of the workflows with full control of the processing environment could be achieved. This would build the backbone of FAIR data processing and would facilitate the development of a solid data provenance scheme in KM3NeT. While the implementation of a full-scale data processing environment goes beyond the scope of this project, this software implementation in currently explored in the ESCAPE project in cooperation with other ESFRIs.