Use explainability (e.g., PyG Explainer) for model trying to differentiate between data and Monto Carlo to see what inputs drive discrepancy
Created by: asogaard
Suggested steps:
-
Curate comparable data and Monte Carlo dataset (e.g., atmospheric muons with similar selections and energy profiles) -
Train binary classification model to label events as belonging to one of these two classes (data or Monte Carlo). - The degree to which the classification model is able to make this distinctions gives an indication of the differences between data and Mote Carlo.
-
If the model classification is able to perform this classification. use, e.g., PyG Explainer to identify which features in the input events are the most important for the model's ability to do this. -
Modify the input events (feature engineering) based on the outcomes of the above points so as to mitigate differences between data and Monte Carlo and thus reduce the model's ability to distinguish between the two classes. -
Repeat this process until the model is unable to distinguish between the two. The resulting feature engineering can then be used in different learning tasks and should lead to models that are insensitive to underlying data/Monte Carlo discrepancies.