Skip to content

Add .root to .sqlite converter

Jorge Prado requested to merge github/fork/kaareendrup/essnusb into main

Created by: kaareendrup

.root to .sqlite conversion

This is an initial attempt at implementing a converter class (and subsequent extractor-classes) for .root to .sqlite conversion. The implementation follows the structure of the current .sqlite converter, with some exceptions and caveats as described in the following.

The following are notable exceptions to the I3 to .sqlite converter:

No multiprocessing

Since the computational bottleneck in case of .root files is submitting dataframes to the .sqlite-databases, there is no multiprocessing. Batchwise conversion is implemented to avoid having the entire database in memory at once.

No pattern batching

This should be relatively easy to implement, but I didn't understand it, so I chose to not bother

No inheritance from current classes

Perhaps the most important one. Since the current base classes have some I3 specific funtionality (like attaching gcd-files to lists of I3 files), the new classes don't inherit from them. This could perhaps be basis for a restructuring of the converter and extractor classes, if the goal is to provide more general functionality for different experiments.

There are probably many other things that I have missed or misunderstood, but I thought it would be good to join the forces of those who have worked hard on the implementation of the current classes with those who have an eye for root functionality.

Use

The classes use the uproot package and awkward pandas (which would maybe need to be added to the requirements). A simple conversion example (for ESSnuSB) could look like:

from graphnet.data.sqlite.root_dataconverter import rootSQLiteDataConverter
from graphnet.data.extractors.rootfeatureextractor import (
    rootFeatureExtractorESSnuSB,
    rootTruthExtractorESSnuSB,
    rootfiTQunExtractorESSnuSB
)

CONVERTER_CLASS = {
    "sqlite": rootSQLiteDataConverter,
}

def main_essnusb(backend: str) -> None:
    """Convert root files to intermediate `backend` format."""
    # Check(s)
    assert backend in CONVERTER_CLASS

    inputs = ["/path/to/files"]
    outdir = "/path/to/db"
    name = "db_name"

    converter: rootSQLiteDataConverter = CONVERTER_CLASS[backend](
        [   
            rootFeatureExtractorESSnuSB('pulsemap_name', 'pulsemap_key'),
            rootTruthExtractorESSnuSB('truth_name', 'truth_key'),
            rootfiTQunExtractorESSnuSB('reco_name', 'reco_key'),
        ],
        outdir,
        name,
        nb_files_to_batch=20,
    )
    converter(inputs)

main_essnusb("sqlite")

I'm unsure about how well this will generalize to different tree structures, but the combination of different main_keys and branch_keys should give a lot of flexibility.

Merge request reports