Issue running training in ARCA file
I have the following ARCA file:
/sps/km3net/users/adomi/GNNs/training/Muons_vs_Neutrinos_shuffled.h5
which is a 86GB file with 400k muons and 400k neutrinos. This has been obtained from several ARCA files converted with OrcaSong with mac_hits=5000.
If I try to run the training on such file I get the following error:
> INFO: Could not find any nv files on this host!
> 2021-01-26 15:10:53.223138: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
> 2021-01-26 15:11:01.368746: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
> 2021-01-26 15:11:01.428897: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
> 2021-01-26 15:11:01.428982: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cca010): /proc/driver/nvidia/version does not exist
> 2021-01-26 15:11:01.430105: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
> To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
> 2021-01-26 15:11:01.445504: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2599990000 Hz
> 2021-01-26 15:11:01.447984: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56acf30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
> 2021-01-26 15:11:01.448019: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
> Using orga label modifier: bg_classifier_2_class
> Using orga dataset modifier: bg_classifier_2_class
> Using orga custom objects
> ------------------------------------------------------------
> ------------------- 2021-01-26 15:11:03 -------------------
> A model has been built using the model builder with the following configurations:
>
> Loss functions:
> bg_output: {'function': 'categorical_crossentropy', 'metrics': ['acc']}
>
>
> Model: "functional_1"
> __________________________________________________________________________________________________
> Layer (type) Output Shape Param # Connected to
> ==================================================================================================
> nodes (InputLayer) [(32, 5000, 7)] 0
> __________________________________________________________________________________________________
> is_valid (InputLayer) [(32, 5000)] 0
> __________________________________________________________________________________________________
> coords (InputLayer) [(32, 5000, 4)] 0
> __________________________________________________________________________________________________
> dense_to_disjoint (DenseToDisjo ((None, 7), (32, 500 0 nodes[0][0]
> is_valid[0][0]
> coords[0][0]
> __________________________________________________________________________________________________
> batch_normalization (BatchNorma (None, 7) 28 dense_to_disjoint[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint (Get ((None, 10, 7), (Non 0 batch_normalization[0][0]
> dense_to_disjoint[0][1]
> dense_to_disjoint[0][2]
> __________________________________________________________________________________________________
> subtract (Subtract) (None, 10, 7) 0 get_edge_features_disjoint[0][0]
> get_edge_features_disjoint[0][1]
> __________________________________________________________________________________________________
> concatenate (Concatenate) (None, 10, 14) 0 get_edge_features_disjoint[0][0]
> subtract[0][0]
> __________________________________________________________________________________________________
> dense (Dense) (None, 10, 64) 896 concatenate[0][0]
> __________________________________________________________________________________________________
> batch_normalization_1 (BatchNor (None, 10, 64) 256 dense[0][0]
> __________________________________________________________________________________________________
> activation (Activation) (None, 10, 64) 0 batch_normalization_1[0][0]
> __________________________________________________________________________________________________
> dense_1 (Dense) (None, 10, 64) 4096 activation[0][0]
> __________________________________________________________________________________________________
> batch_normalization_2 (BatchNor (None, 10, 64) 256 dense_1[0][0]
> __________________________________________________________________________________________________
> activation_1 (Activation) (None, 10, 64) 0 batch_normalization_2[0][0]
> __________________________________________________________________________________________________
> dense_2 (Dense) (None, 10, 64) 4096 activation_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_3 (BatchNor (None, 10, 64) 256 dense_2[0][0]
> __________________________________________________________________________________________________
> dense_3 (Dense) (None, 64) 448 batch_normalization[0][0]
> __________________________________________________________________________________________________
> activation_2 (Activation) (None, 10, 64) 0 batch_normalization_3[0][0]
> __________________________________________________________________________________________________
> batch_normalization_4 (BatchNor (None, 64) 256 dense_3[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean (TensorFlowOpL [(None, 64)] 0 activation_2[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2 (TensorFlowOp [(None, 64)] 0 batch_normalization_4[0][0]
> tf_op_layer_Mean[0][0]
> __________________________________________________________________________________________________
> activation_3 (Activation) (None, 64) 0 tf_op_layer_AddV2[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_1 (G ((None, 10, 64), (No 0 activation_3[0][0]
> dense_to_disjoint[0][1]
> activation_3[0][0]
> __________________________________________________________________________________________________
> subtract_1 (Subtract) (None, 10, 64) 0 get_edge_features_disjoint_1[0][0
> get_edge_features_disjoint_1[0][1
> __________________________________________________________________________________________________
> concatenate_1 (Concatenate) (None, 10, 128) 0 get_edge_features_disjoint_1[0][0
> subtract_1[0][0]
> __________________________________________________________________________________________________
> dense_4 (Dense) (None, 10, 128) 16384 concatenate_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_5 (BatchNor (None, 10, 128) 512 dense_4[0][0]
> __________________________________________________________________________________________________
> activation_4 (Activation) (None, 10, 128) 0 batch_normalization_5[0][0]
> __________________________________________________________________________________________________
> dense_5 (Dense) (None, 10, 128) 16384 activation_4[0][0]
> __________________________________________________________________________________________________
> batch_normalization_6 (BatchNor (None, 10, 128) 512 dense_5[0][0]
> __________________________________________________________________________________________________
> activation_5 (Activation) (None, 10, 128) 0 batch_normalization_6[0][0]
> __________________________________________________________________________________________________
> dense_6 (Dense) (None, 10, 128) 16384 activation_5[0][0]
> __________________________________________________________________________________________________
> batch_normalization_7 (BatchNor (None, 10, 128) 512 dense_6[0][0]
> __________________________________________________________________________________________________
> dense_7 (Dense) (None, 128) 8192 activation_3[0][0]
> __________________________________________________________________________________________________
> activation_6 (Activation) (None, 10, 128) 0 batch_normalization_7[0][0]
> __________________________________________________________________________________________________
> batch_normalization_8 (BatchNor (None, 128) 512 dense_7[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_1 (TensorFlowO [(None, 128)] 0 activation_6[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_1 (TensorFlow [(None, 128)] 0 batch_normalization_8[0][0]
> tf_op_layer_Mean_1[0][0]
> __________________________________________________________________________________________________
> activation_7 (Activation) (None, 128) 0 tf_op_layer_AddV2_1[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_2 (G ((None, 10, 128), (N 0 activation_7[0][0]
> dense_to_disjoint[0][1]
> activation_7[0][0]
> __________________________________________________________________________________________________
> subtract_2 (Subtract) (None, 10, 128) 0 get_edge_features_disjoint_2[0][0
> get_edge_features_disjoint_2[0][1
> __________________________________________________________________________________________________
> concatenate_2 (Concatenate) (None, 10, 256) 0 get_edge_features_disjoint_2[0][0
> subtract_2[0][0]
> __________________________________________________________________________________________________
> dense_8 (Dense) (None, 10, 256) 65536 concatenate_2[0][0]
> __________________________________________________________________________________________________
> batch_normalization_9 (BatchNor (None, 10, 256) 1024 dense_8[0][0]
> __________________________________________________________________________________________________
> activation_8 (Activation) (None, 10, 256) 0 batch_normalization_9[0][0]
> __________________________________________________________________________________________________
> dense_9 (Dense) (None, 10, 256) 65536 activation_8[0][0]
> __________________________________________________________________________________________________
> batch_normalization_10 (BatchNo (None, 10, 256) 1024 dense_9[0][0]
> __________________________________________________________________________________________________
> activation_9 (Activation) (None, 10, 256) 0 batch_normalization_10[0][0]
> __________________________________________________________________________________________________
> dense_10 (Dense) (None, 10, 256) 65536 activation_9[0][0]
> __________________________________________________________________________________________________
> batch_normalization_11 (BatchNo (None, 10, 256) 1024 dense_10[0][0]
> __________________________________________________________________________________________________
> dense_11 (Dense) (None, 256) 32768 activation_7[0][0]
> __________________________________________________________________________________________________
> activation_10 (Activation) (None, 10, 256) 0 batch_normalization_11[0][0]
> __________________________________________________________________________________________________
> batch_normalization_12 (BatchNo (None, 256) 1024 dense_11[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_2 (TensorFlowO [(None, 256)] 0 activation_10[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_2 (TensorFlow [(None, 256)] 0 batch_normalization_12[0][0]
> tf_op_layer_Mean_2[0][0]
> __________________________________________________________________________________________________
> activation_11 (Activation) (None, 256) 0 tf_op_layer_AddV2_2[0][0]
> __________________________________________________________________________________________________
> global_avg_pooling_disjoint (Gl (32, 256) 0 activation_11[0][0]
> dense_to_disjoint[0][1]
> __________________________________________________________________________________________________
> bg_output (Dense) (32, 2) 514 global_avg_pooling_disjoint[0][0]
> ==================================================================================================
> Total params: 303,966
> Trainable params: 300,368
> Non-trainable params: 3,598
> __________________________________________________________________________________________________
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/saved_models
> Model: "functional_1"
> __________________________________________________________________________________________________
> Layer (type) Output Shape Param # Connected to
> ==================================================================================================
> nodes (InputLayer) [(32, 5000, 7)] 0
> __________________________________________________________________________________________________
> is_valid (InputLayer) [(32, 5000)] 0
> __________________________________________________________________________________________________
> coords (InputLayer) [(32, 5000, 4)] 0
> __________________________________________________________________________________________________
> dense_to_disjoint (DenseToDisjo ((None, 7), (32, 500 0 nodes[0][0]
> is_valid[0][0]
> coords[0][0]
> __________________________________________________________________________________________________
> batch_normalization (BatchNorma (None, 7) 28 dense_to_disjoint[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint (Get ((None, 10, 7), (Non 0 batch_normalization[0][0]
> dense_to_disjoint[0][1]
> dense_to_disjoint[0][2]
> __________________________________________________________________________________________________
> subtract (Subtract) (None, 10, 7) 0 get_edge_features_disjoint[0][0]
> get_edge_features_disjoint[0][1]
> __________________________________________________________________________________________________
> concatenate (Concatenate) (None, 10, 14) 0 get_edge_features_disjoint[0][0]
> subtract[0][0]
> __________________________________________________________________________________________________
> dense (Dense) (None, 10, 64) 896 concatenate[0][0]
> __________________________________________________________________________________________________
> batch_normalization_1 (BatchNor (None, 10, 64) 256 dense[0][0]
> __________________________________________________________________________________________________
> activation (Activation) (None, 10, 64) 0 batch_normalization_1[0][0]
> __________________________________________________________________________________________________
> dense_1 (Dense) (None, 10, 64) 4096 activation[0][0]
> __________________________________________________________________________________________________
> batch_normalization_2 (BatchNor (None, 10, 64) 256 dense_1[0][0]
> __________________________________________________________________________________________________
> activation_1 (Activation) (None, 10, 64) 0 batch_normalization_2[0][0]
> __________________________________________________________________________________________________
> dense_2 (Dense) (None, 10, 64) 4096 activation_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_3 (BatchNor (None, 10, 64) 256 dense_2[0][0]
> __________________________________________________________________________________________________
> dense_3 (Dense) (None, 64) 448 batch_normalization[0][0]
> __________________________________________________________________________________________________
> activation_2 (Activation) (None, 10, 64) 0 batch_normalization_3[0][0]
> __________________________________________________________________________________________________
> batch_normalization_4 (BatchNor (None, 64) 256 dense_3[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean (TensorFlowOpL [(None, 64)] 0 activation_2[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2 (TensorFlowOp [(None, 64)] 0 batch_normalization_4[0][0]
> tf_op_layer_Mean[0][0]
> __________________________________________________________________________________________________
> activation_3 (Activation) (None, 64) 0 tf_op_layer_AddV2[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_1 (G ((None, 10, 64), (No 0 activation_3[0][0]
> dense_to_disjoint[0][1]
> activation_3[0][0]
> __________________________________________________________________________________________________
> subtract_1 (Subtract) (None, 10, 64) 0 get_edge_features_disjoint_1[0][0
> get_edge_features_disjoint_1[0][1
> __________________________________________________________________________________________________
> concatenate_1 (Concatenate) (None, 10, 128) 0 get_edge_features_disjoint_1[0][0
> subtract_1[0][0]
> __________________________________________________________________________________________________
> dense_4 (Dense) (None, 10, 128) 16384 concatenate_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_5 (BatchNor (None, 10, 128) 512 dense_4[0][0]
> __________________________________________________________________________________________________
> activation_4 (Activation) (None, 10, 128) 0 batch_normalization_5[0][0]
> __________________________________________________________________________________________________
> dense_5 (Dense) (None, 10, 128) 16384 activation_4[0][0]
> __________________________________________________________________________________________________
> batch_normalization_6 (BatchNor (None, 10, 128) 512 dense_5[0][0]
> __________________________________________________________________________________________________
> activation_5 (Activation) (None, 10, 128) 0 batch_normalization_6[0][0]
> __________________________________________________________________________________________________
> dense_6 (Dense) (None, 10, 128) 16384 activation_5[0][0]
> __________________________________________________________________________________________________
> batch_normalization_7 (BatchNor (None, 10, 128) 512 dense_6[0][0]
> __________________________________________________________________________________________________
> dense_7 (Dense) (None, 128) 8192 activation_3[0][0]
> __________________________________________________________________________________________________
> activation_6 (Activation) (None, 10, 128) 0 batch_normalization_7[0][0]
> __________________________________________________________________________________________________
> batch_normalization_8 (BatchNor (None, 128) 512 dense_7[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_1 (TensorFlowO [(None, 128)] 0 activation_6[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_1 (TensorFlow [(None, 128)] 0 batch_normalization_8[0][0]
> tf_op_layer_Mean_1[0][0]
> __________________________________________________________________________________________________
> activation_7 (Activation) (None, 128) 0 tf_op_layer_AddV2_1[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_2 (G ((None, 10, 128), (N 0 activation_7[0][0]
> dense_to_disjoint[0][1]
> activation_7[0][0]
> __________________________________________________________________________________________________
> subtract_2 (Subtract) (None, 10, 128) 0 get_edge_features_disjoint_2[0][0
> get_edge_features_disjoint_2[0][1
> __________________________________________________________________________________________________
> concatenate_2 (Concatenate) (None, 10, 256) 0 get_edge_features_disjoint_2[0][0
> subtract_2[0][0]
> __________________________________________________________________________________________________
> dense_8 (Dense) (None, 10, 256) 65536 concatenate_2[0][0]
> __________________________________________________________________________________________________
> batch_normalization_9 (BatchNor (None, 10, 256) 1024 dense_8[0][0]
> __________________________________________________________________________________________________
> activation_8 (Activation) (None, 10, 256) 0 batch_normalization_9[0][0]
> __________________________________________________________________________________________________
> dense_9 (Dense) (None, 10, 256) 65536 activation_8[0][0]
> __________________________________________________________________________________________________
> batch_normalization_10 (BatchNo (None, 10, 256) 1024 dense_9[0][0]
> __________________________________________________________________________________________________
> activation_9 (Activation) (None, 10, 256) 0 batch_normalization_10[0][0]
> __________________________________________________________________________________________________
> dense_10 (Dense) (None, 10, 256) 65536 activation_9[0][0]
> __________________________________________________________________________________________________
> batch_normalization_11 (BatchNo (None, 10, 256) 1024 dense_10[0][0]
> __________________________________________________________________________________________________
> dense_11 (Dense) (None, 256) 32768 activation_7[0][0]
> __________________________________________________________________________________________________
> activation_10 (Activation) (None, 10, 256) 0 batch_normalization_11[0][0]
> __________________________________________________________________________________________________
> batch_normalization_12 (BatchNo (None, 256) 1024 dense_11[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_2 (TensorFlowO [(None, 256)] 0 activation_10[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_2 (TensorFlow [(None, 256)] 0 batch_normalization_12[0][0]
> tf_op_layer_Mean_2[0][0]
> __________________________________________________________________________________________________
> activation_11 (Activation) (None, 256) 0 tf_op_layer_AddV2_2[0][0]
> __________________________________________________________________________________________________
> global_avg_pooling_disjoint (Gl (32, 256) 0 activation_11[0][0]
> dense_to_disjoint[0][1]
> __________________________________________________________________________________________________
> bg_output (Dense) (32, 2) 514 global_avg_pooling_disjoint[0][0]
> ==================================================================================================
> Total params: 303,966
> Trainable params: 300,368
> Non-trainable params: 3,598
> __________________________________________________________________________________________________
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/plots
> /sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/core.py:555: UserWarning: Can not plot model: ('Failed to import pydot. You must `pip install pydot` and install graphviz (https://graphviz.gitlab.io/download/), ', 'for `pydotprint` to work.')
> warnings.warn("Can not plot model: " + str(e))
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/train_log
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/plots/activations
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/predictions
> Creating directory: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/predictions/inference
> Model: "functional_1"
> __________________________________________________________________________________________________
> Layer (type) Output Shape Param # Connected to
> ==================================================================================================
> nodes (InputLayer) [(32, 5000, 7)] 0
> __________________________________________________________________________________________________
> is_valid (InputLayer) [(32, 5000)] 0
> __________________________________________________________________________________________________
> coords (InputLayer) [(32, 5000, 4)] 0
> __________________________________________________________________________________________________
> dense_to_disjoint (DenseToDisjo ((None, 7), (32, 500 0 nodes[0][0]
> is_valid[0][0]
> coords[0][0]
> __________________________________________________________________________________________________
> batch_normalization (BatchNorma (None, 7) 28 dense_to_disjoint[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint (Get ((None, 10, 7), (Non 0 batch_normalization[0][0]
> dense_to_disjoint[0][1]
> dense_to_disjoint[0][2]
> __________________________________________________________________________________________________
> subtract (Subtract) (None, 10, 7) 0 get_edge_features_disjoint[0][0]
> get_edge_features_disjoint[0][1]
> __________________________________________________________________________________________________
> concatenate (Concatenate) (None, 10, 14) 0 get_edge_features_disjoint[0][0]
> subtract[0][0]
> __________________________________________________________________________________________________
> dense (Dense) (None, 10, 64) 896 concatenate[0][0]
> __________________________________________________________________________________________________
> batch_normalization_1 (BatchNor (None, 10, 64) 256 dense[0][0]
> __________________________________________________________________________________________________
> activation (Activation) (None, 10, 64) 0 batch_normalization_1[0][0]
> __________________________________________________________________________________________________
> dense_1 (Dense) (None, 10, 64) 4096 activation[0][0]
> __________________________________________________________________________________________________
> batch_normalization_2 (BatchNor (None, 10, 64) 256 dense_1[0][0]
> __________________________________________________________________________________________________
> activation_1 (Activation) (None, 10, 64) 0 batch_normalization_2[0][0]
> __________________________________________________________________________________________________
> dense_2 (Dense) (None, 10, 64) 4096 activation_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_3 (BatchNor (None, 10, 64) 256 dense_2[0][0]
> __________________________________________________________________________________________________
> dense_3 (Dense) (None, 64) 448 batch_normalization[0][0]
> __________________________________________________________________________________________________
> activation_2 (Activation) (None, 10, 64) 0 batch_normalization_3[0][0]
> __________________________________________________________________________________________________
> batch_normalization_4 (BatchNor (None, 64) 256 dense_3[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean (TensorFlowOpL [(None, 64)] 0 activation_2[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2 (TensorFlowOp [(None, 64)] 0 batch_normalization_4[0][0]
> tf_op_layer_Mean[0][0]
> __________________________________________________________________________________________________
> activation_3 (Activation) (None, 64) 0 tf_op_layer_AddV2[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_1 (G ((None, 10, 64), (No 0 activation_3[0][0]
> dense_to_disjoint[0][1]
> activation_3[0][0]
> __________________________________________________________________________________________________
> subtract_1 (Subtract) (None, 10, 64) 0 get_edge_features_disjoint_1[0][0
> get_edge_features_disjoint_1[0][1
> __________________________________________________________________________________________________
> concatenate_1 (Concatenate) (None, 10, 128) 0 get_edge_features_disjoint_1[0][0
> subtract_1[0][0]
> __________________________________________________________________________________________________
> dense_4 (Dense) (None, 10, 128) 16384 concatenate_1[0][0]
> __________________________________________________________________________________________________
> batch_normalization_5 (BatchNor (None, 10, 128) 512 dense_4[0][0]
> __________________________________________________________________________________________________
> activation_4 (Activation) (None, 10, 128) 0 batch_normalization_5[0][0]
> __________________________________________________________________________________________________
> dense_5 (Dense) (None, 10, 128) 16384 activation_4[0][0]
> __________________________________________________________________________________________________
> batch_normalization_6 (BatchNor (None, 10, 128) 512 dense_5[0][0]
> __________________________________________________________________________________________________
> activation_5 (Activation) (None, 10, 128) 0 batch_normalization_6[0][0]
> __________________________________________________________________________________________________
> dense_6 (Dense) (None, 10, 128) 16384 activation_5[0][0]
> __________________________________________________________________________________________________
> batch_normalization_7 (BatchNor (None, 10, 128) 512 dense_6[0][0]
> __________________________________________________________________________________________________
> dense_7 (Dense) (None, 128) 8192 activation_3[0][0]
> __________________________________________________________________________________________________
> activation_6 (Activation) (None, 10, 128) 0 batch_normalization_7[0][0]
> __________________________________________________________________________________________________
> batch_normalization_8 (BatchNor (None, 128) 512 dense_7[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_1 (TensorFlowO [(None, 128)] 0 activation_6[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_1 (TensorFlow [(None, 128)] 0 batch_normalization_8[0][0]
> tf_op_layer_Mean_1[0][0]
> __________________________________________________________________________________________________
> activation_7 (Activation) (None, 128) 0 tf_op_layer_AddV2_1[0][0]
> __________________________________________________________________________________________________
> get_edge_features_disjoint_2 (G ((None, 10, 128), (N 0 activation_7[0][0]
> dense_to_disjoint[0][1]
> activation_7[0][0]
> __________________________________________________________________________________________________
> subtract_2 (Subtract) (None, 10, 128) 0 get_edge_features_disjoint_2[0][0
> get_edge_features_disjoint_2[0][1
> __________________________________________________________________________________________________
> concatenate_2 (Concatenate) (None, 10, 256) 0 get_edge_features_disjoint_2[0][0
> subtract_2[0][0]
> __________________________________________________________________________________________________
> dense_8 (Dense) (None, 10, 256) 65536 concatenate_2[0][0]
> __________________________________________________________________________________________________
> batch_normalization_9 (BatchNor (None, 10, 256) 1024 dense_8[0][0]
> __________________________________________________________________________________________________
> activation_8 (Activation) (None, 10, 256) 0 batch_normalization_9[0][0]
> __________________________________________________________________________________________________
> dense_9 (Dense) (None, 10, 256) 65536 activation_8[0][0]
> __________________________________________________________________________________________________
> batch_normalization_10 (BatchNo (None, 10, 256) 1024 dense_9[0][0]
> __________________________________________________________________________________________________
> activation_9 (Activation) (None, 10, 256) 0 batch_normalization_10[0][0]
> __________________________________________________________________________________________________
> dense_10 (Dense) (None, 10, 256) 65536 activation_9[0][0]
> __________________________________________________________________________________________________
> batch_normalization_11 (BatchNo (None, 10, 256) 1024 dense_10[0][0]
> __________________________________________________________________________________________________
> dense_11 (Dense) (None, 256) 32768 activation_7[0][0]
> __________________________________________________________________________________________________
> activation_10 (Activation) (None, 10, 256) 0 batch_normalization_11[0][0]
> __________________________________________________________________________________________________
> batch_normalization_12 (BatchNo (None, 256) 1024 dense_11[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_Mean_2 (TensorFlowO [(None, 256)] 0 activation_10[0][0]
> __________________________________________________________________________________________________
> tf_op_layer_AddV2_2 (TensorFlow [(None, 256)] 0 batch_normalization_12[0][0]
> tf_op_layer_Mean_2[0][0]
> __________________________________________________________________________________________________
> activation_11 (Activation) (None, 256) 0 tf_op_layer_AddV2_2[0][0]
> __________________________________________________________________________________________________
> global_avg_pooling_disjoint (Gl (32, 256) 0 activation_11[0][0]
> dense_to_disjoint[0][1]
> __________________________________________________________________________________________________
> bg_output (Dense) (32, 2) 514 global_avg_pooling_disjoint[0][0]
> ==================================================================================================
> Total params: 303,966
> Trainable params: 300,368
> Non-trainable params: 3,598
> __________________________________________________________________________________________________
>
> Input check
> -----------
> The data in the files of the toml list have the following names and shapes:
> points (5000, 17)
>
> After applying your sample modifier, they have the following names and shapes:
> nodes (5000, 7)
> is_valid (5000,)
> coords (5000, 4)
>
> Your model requires the following input names and shapes:
> nodes (5000, 7)
> is_valid (5000,)
> coords (5000, 4)
>
> Input check passed.
>
> Output check
> ------------
> The following 31 label names are in the first file of the toml list:
> event_id, particle_type, energy, is_cc, bjorkeny, dir_x, dir_y, dir_z, time_interaction, run_id, vertex_pos_x, vertex_pos_y, vertex_pos_z, n_hits, weight_w1, weight_w2, weight_w3, n_gen, std_dir_x, std_dir_y, std_dir_z, std_beta0, std_lik, std_n_hits_gandalf, std_pos_x, std_pos_y, std_pos_z, std_energy, std_lik_energy, std_length, group_id
>
> The following 1 labels get produced from them by your label_modifier:
> bg_output
>
> Your model has the following 1 output layers:
> bg_output
>
> Output check passed.
>
> ------------------------------------------------------------
> ------------------- 2021-01-26 15:11:03 -------------------
> Training run started with the following configuration:
>
> Output folder: /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/
> List file path: /sps/km3net/users/adomi/GNNs/scripts/listARCA.toml
>
> Given trainfiles in the .list file:
> points:
> /sps/km3net/users/adomi/GNNs/training/Muons_vs_Neutrinos_shuffled.h5
>
> Given validation files in the .list file:
> points:
> /sps/km3net/users/adomi/GNNs/Atmospheric_Muons/ML_mcv5.2.mupage_50T.sirene.jte.jchain.aashower.20.h5
> /sps/km3net/users/adomi/GNNs/NuMuCC/ML_mcv5.1_res.genhen_numuCC.km3_AAv1.jte.jchain.aashower.19.h5
> /sps/km3net/users/adomi/GNNs/NuMuCC/ML_mcv5.1_res.genhen_anumuCC.km3_AAv1.jte.jchain.aashower.10.h5
>
> Settings used:
> batchsize: 32
> learning_rate: <function orca_learning_rates.<locals>.learning_rate at 0x7f6dc2b62b90>
> zero_center_folder: None
> validate_interval: 3
> cleanup_models: False
> class_weight: None
> sample_modifier: <orcanet_contrib.orca_handler_util.GraphSampleMod object at 0x7f6dd2d84a50>
> dataset_modifier: <function orca_dataset_modifiers.<locals>.dataset_modifier at 0x7f6dce2350e0>
> label_modifier: <function orca_label_modifiers.<locals>.label_modifier at 0x7f6ddacb2320>
> key_x_values: x
> key_y_values: y
> custom_objects: {'loss_direction': <function loss_direction at 0x7f6dc2afb3b0>, 'loss_uncertainty_mse': <function loss_uncertainty_mse at 0x7f6dc2afb200>, 'loss_uncertainty_mae': <function loss_uncertainty_mae at 0x7f6dc2afb170>, 'loss_uncertainty_gaussian_likelihood': <function loss_uncertainty_gaussian_likelihood at 0x7f6dc2afb290>, 'loss_uncertainty_gaussian_likelihood_dir': <function loss_uncertainty_gaussian_likelihood_dir at 0x7f6dc2afb320>, 'loss_mean_relative_error_energy': <function loss_mean_relative_error_energy at 0x7f6dc2afb050>, 'loss_log_prob': <function loss_log_prob at 0x7f6dc2af6560>, 'loss_log_prob_laplace': <function loss_log_prob_laplace at 0x7f6dc2af6f80>, 'loss_uncertainty_mre': <function loss_uncertainty_mre at 0x7f6dc2afb0e0>}
> shuffle_train: True
> fixed_batchsize: True
> callback_train: []
> use_scratch_ssd: False
> verbose_train: 1
> verbose_val: 0
> make_weight_plots: False
> n_events: None
> max_queue_size: 10
> train_logger_display: 200
> train_logger_flush: -1
> multi_gpu: True
> y_field_names: None
>
> Training in epoch 1 on file 1/1
> -------------------------------
> Learning rate is at 0.02500000037252903
> Inputs and files:
> points: Muons_vs_Neutrinos_shuffled.h5
> /pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:432: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
> "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
> Traceback (most recent call last):
> File "/pbs/home/a/adomi/mypython/bin/orcatrain", line 11, in <module>
> load_entry_point('orcanet', 'console_scripts', 'orcatrain')()
> File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet_contrib/parser_orcatrain.py", line 134, in main
> recompile_model=recompile_model)
> File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet_contrib/parser_orcatrain.py", line 121, in orca_train
> orga.train_and_validate(model=model,epochs=int(no_epochs))
> File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/core.py", line 149, in train_and_validate
> self.train(model)
> File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/core.py", line 226, in train
> history = backend.train_model(self, model, next_epoch, batch_logger=True)
> File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/backend.py", line 73, in train_model
> epochs=epoch[0],
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
> return method(self, *args, **kwargs)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
> tmp_logs = train_function(iterator)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
> result = self._call(*args, **kwds)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
> return self._stateless_fn(*args, **kwds)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
> return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
> cancellation_manager=cancellation_manager)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
> ctx, args, cancellation_manager=cancellation_manager))
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
> ctx=ctx)
> File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
> inputs, attrs, num_outputs)
> tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [9760,1] vs. [17258,10]
> [[node functional_1/get_edge_features_disjoint/add (defined at pbs/home/a/adomi/mypython/lib/python3.7/site-packages/medgeconv/util_disjoint.py:51) ]] [Op:__inference_train_function_7285]
>
> Errors may have originated from an input operation.
> Input Source operations connected to node functional_1/get_edge_features_disjoint/add:
> functional_1/get_edge_features_disjoint/map/RaggedFromVariant/RaggedTensorFromVariant (defined at pbs/home/a/adomi/mypython/lib/python3.7/site-packages/medgeconv/util_disjoint.py:35)
>
> Function call stack:
> train_function
>
> 2021-01-26 15:11:08.338853: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
> [[{{node PyFunc}}]]
And I don't understand what is the problem.. Please let me know if you need more information!
Edited by Stefan Reck