Add .root to .sqlite converter
Created by: kaareendrup
.root to .sqlite conversion
This is an initial attempt at implementing a converter class (and subsequent extractor-classes) for .root to .sqlite conversion. The implementation follows the structure of the current .sqlite converter, with some exceptions and caveats as described in the following.
The following are notable exceptions to the I3 to .sqlite converter:
No multiprocessing
Since the computational bottleneck in case of .root files is submitting dataframes to the .sqlite-databases, there is no multiprocessing. Batchwise conversion is implemented to avoid having the entire database in memory at once.
No pattern batching
This should be relatively easy to implement, but I didn't understand it, so I chose to not bother
No inheritance from current classes
Perhaps the most important one. Since the current base classes have some I3 specific funtionality (like attaching gcd-files to lists of I3 files), the new classes don't inherit from them. This could perhaps be basis for a restructuring of the converter and extractor classes, if the goal is to provide more general functionality for different experiments.
There are probably many other things that I have missed or misunderstood, but I thought it would be good to join the forces of those who have worked hard on the implementation of the current classes with those who have an eye for root functionality.
Use
The classes use the uproot package and awkward pandas (which would maybe need to be added to the requirements). A simple conversion example (for ESSnuSB) could look like:
from graphnet.data.sqlite.root_dataconverter import rootSQLiteDataConverter
from graphnet.data.extractors.rootfeatureextractor import (
rootFeatureExtractorESSnuSB,
rootTruthExtractorESSnuSB,
rootfiTQunExtractorESSnuSB
)
CONVERTER_CLASS = {
"sqlite": rootSQLiteDataConverter,
}
def main_essnusb(backend: str) -> None:
"""Convert root files to intermediate `backend` format."""
# Check(s)
assert backend in CONVERTER_CLASS
inputs = ["/path/to/files"]
outdir = "/path/to/db"
name = "db_name"
converter: rootSQLiteDataConverter = CONVERTER_CLASS[backend](
[
rootFeatureExtractorESSnuSB('pulsemap_name', 'pulsemap_key'),
rootTruthExtractorESSnuSB('truth_name', 'truth_key'),
rootfiTQunExtractorESSnuSB('reco_name', 'reco_key'),
],
outdir,
name,
nb_files_to_batch=20,
)
converter(inputs)
main_essnusb("sqlite")
I'm unsure about how well this will generalize to different tree structures, but the combination of different main_keys and branch_keys should give a lot of flexibility.