Skip to content
Snippets Groups Projects

Graph Normalisation

Closed Jorge Prado requested to merge github/fork/MortenHolmRep/GraphNorm into main

Created by: MortenHolmRep

In an attempt to improve our overall reconstruction predictions, I found the following article. As it turns out, it was already implemented in PyTorch geometric, and this PR is a draft implementation of graph normalisation using GraphNorm from PyTorch geometric.

We can alleviate overfitting and stabilise training by normalising the representations of nodes in our graphs and ensure that the predictions of the network are not overly influenced by individual nodes that have extreme values and instead emphasise overall graph structure.

Merge request reports

Approval is optional

Closed by Jorge PradoJorge Prado 1 year ago (Jul 20, 2023 9:02am UTC)

Merge details

  • The changes were not merged into main.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Jorge Prado requested review from @jprado

    requested review from @jprado

  • Jorge Prado requested review from @jprado

    requested review from @jprado

  • Created by: RasmusOrsoe

    @MortenHolmRep thanks for thinking along these lines!

    I think it's very important that we only introduce modifications to dynedge that we know are improvements in our domain. Therefore, I don't think we can merge this PR before we have concrete evidence that this modification helps in at least one of our applications of dynedge.

    In the paper you reference, they apply this method (along with more causal approaches like BatchNorm) to a series of datasets where they don't scale the data. They use ADAM optimizer with a linear decaying learning rate schedule. They claim two advantages; convergence speed and performance improvement. In the snapshot below you see their performance improvements; many of which has so large statistical errors that they may not be improvements at all (at least, at a glance, that's what I gather. Correct me if I'm wrong.) image

    I see two places were we could potentially use this

    • in dynedge if it actually improves either convergence or performance on a well-known dataset (oscNext level7)
    • In a general Detector class where the scaling is learnable instead of hard-coded. This could be convenient.

    If you wish, I can help you design a proper test of the effects of this proposed change.

  • Created by: MortenHolmRep

    @MortenHolmRep thanks for thinking along these lines!

    I think it's very important that we only introduce modifications to dynedge that we know are improvements in our domain. Therefore, I don't think we can merge this PR before we have concrete evidence that this modification helps in at least one of our applications of dynedge.

    Thank you for the review! It is a valid point about not adding things to the model that is redundant, and it has to be tested thoroughly before being added.

    In the paper you reference, they apply this method (along with more causal approaches like BatchNorm) to a series of datasets where they don't scale the data. They use ADAM optimizer with a linear decaying learning rate schedule. They claim two advantages; convergence speed and performance improvement. In the snapshot below you see their performance improvements; many of which has so large statistical errors that they may not be improvements at all (at least, at a glance, that's what I gather. Correct me if I'm wrong.)

    I agree with your assessment of the article, In my own quick test for GraphNeT, I saw neither a convergence improvement nor a performance improvement. I would however like to set up more rigorous tests than a quick duct-taped approach of what databases and model configs are lying around. Another point of the article is the effect of batch size for GCNs, which can greatly affect the performance of the graph normalisation (and other normalisations), in my test I did not use the full database and the default batch size for the training model. So we may see improvements at higher batch sizes.

    I see two places were we could potentially use this

    • in dynedge if it actually improves either convergence or performance on a well-known dataset (oscNext level7)
    • In a general Detector class where the scaling is learnable instead of hard-coded. This could be convenient.

    If you wish, I can help you design a proper test of the effects of this proposed change.

    I welcome the help, It is difficult to find a set of tests that definitively excludes the future efficacy of graph normalisation to the network, i.e. if it performs better with a different optimizer or together with batch normalisation (which should affect other things like covariate shift, which may have an effect on burn sample inference) etc. I will look into further tests after I have defended my thesis. So far, I have only done an energy reconstruction on the oscNext level7 database, which did not immediately show any improvement. My next steps would have been to try zenith and azimuth reconstructions. As graph normalisation should improve the generated graph structure, I would also suggest we try on Upgrade data where the noise flag can be turned on and off; that way, we would get a better idea of how it affects graph structure.

    My suggestion would be to create model configurations and dataset configurations and test across a wide selection to see the improvements, if any.

  • Created by: asogaard

    I generally I agree that we shouldn't add anything if it doesn't add value. (At least when adding new functionality like this; I think there can be value to adding alternatives to existing functionality, e.g., a new type of GNN layer, even if it doesn't beat DynEdge on select tasks.) It seems like it is not clear whether graph normalisation is beneficial — please correct me if this is the wrong read of your discussion below. If that's the case, should we convert this to a draft PR until additional studies have been done, @MortenHolmRep?

  • Created by: MortenHolmRep

    I generally I agree that we shouldn't add anything if it doesn't add value. (At least when adding new functionality like this; I think there can be value to adding alternatives to existing functionality, e.g., a new type of GNN layer, even if it doesn't beat DynEdge on select tasks.) It seems like it is not clear whether graph normalisation is beneficial — please correct me if this is the wrong read of your discussion below. If that's the case, should we convert this to a draft PR until additional studies have been done, @MortenHolmRep?

    @asogaard That is correct, and we should convert it to a draft PR for now. I mainly pulled it out of the draft to get a review (as I assume that is how it worked).

    I have been helping one of my friends with his GNN, where we implemented this graph normalisation. In that setup, we see an improvement for a GNN that does not have a DynEdge structure, but I do not currently see the same immediate improvement in our setup. The bash PR last week was mainly to set up a proper testing environment for various performance-related tests in a structured manner, so some time after my thesis defence I should have a proper test ready.

  • Created by: MortenHolmRep

    This PR is pending the new performance comparison datasets discussed during the meetings.

  • Author Owner

    Created by: RasmusOrsoe

    Hey @MortenHolmRep! Is there any news on this?

  • Author Owner

    Created by: MortenHolmRep

    Hey @MortenHolmRep! Is there any news on this?

    I believe I no longer have access to the data required to run the tests. The code works as is though.

    Currently it is uncertain if a given reconstruction or selection will see benefit from Graph Normalisation and adding to this uncertainty if it will perform better as an ensemble method together with a transformer as seen from the kaggle competition. I am puzzled about the initial results from my (admittedly very limited) test on small datasets, that I saw no benefit when much of the literature around it suggests a benefit should be had.

    To my understanding one of the key features of using graph normalisation is to address handling of scale variance, when a dataset has different node degress, such as the case for particles traces and their connectivity in our instance.

    Another thing that it should help with is over-smoothing, as a pre-processing step, contrary to when we use DynEdge, which is a method where edge weights are learned during training and then dynamically changed. Normalization provides a base scale to the edges' weights, ensuring that the model does not suffer from scale variance issues or for example gradient explosion. To reitereate or provide background on smoothing issues, when we are dealing with GCNs, which has multiple convolutions, the over-smoothing problem can manifest as the loss of distinctiveness in node features as information propagates through multiple layers. The two methods should therefore be complimentary.

    Model training speeds and model convergence should also be improved by graph normalisation, however, I see this as a lesser benefit as GraphNeT is already pretty speedy as is.

    Current steps as I see it:

    1. add the functionality and have it disabled as a default (currently the case) and leave it up to the user if they want to try it on a given model.
    2. discard the functionality and close the PR
    3. run extensive tests on various databases to see where it could give a benefit and leave instructions for users on where it is useful.
  • closed

Please register or sign in to reply
Loading