Error message in Prediction
I have ran orcapred
after the training. However, right after it starts, I get the following error message:
2021-02-02 22:22:43.467399: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-02 22:22:52.890515: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-02-02 22:22:52.901478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: Tesla V100-PCIE-32GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-02-02 22:22:52.901542: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-02 22:22:52.972837: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-02 22:22:53.015587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-02-02 22:22:53.047106: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-02-02 22:22:53.130056: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-02-02 22:22:53.149480: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-02-02 22:22:53.308665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-02 22:22:53.311726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-02 22:22:53.317254: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-02 22:22:53.339971: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2200000000 Hz
2021-02-02 22:22:53.340317: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a0eef0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-02 22:22:53.340349: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-02-02 22:22:53.507777: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5a2e9b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-02-02 22:22:53.507830: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2021-02-02 22:22:53.511271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: Tesla V100-PCIE-32GB computeCapability: 7.0
coreClock: 1.38GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-02-02 22:22:53.511336: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-02 22:22:53.511369: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-02-02 22:22:53.511391: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-02-02 22:22:53.511413: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-02-02 22:22:53.511431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-02-02 22:22:53.511450: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-02-02 22:22:53.511469: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-02-02 22:22:53.516323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-02 22:22:53.519152: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-02-02 22:22:55.733367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-02 22:22:55.733426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-02-02 22:22:55.733440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-02-02 22:22:55.736476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30122 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:af:00.0, compute capability: 7.0)
Using orga label modifier: bg_classifier_2_class
Using orga dataset modifier: bg_classifier_2_class
Using orga custom objects
Automatically set epoch to epoch 4 file 1.
Loading saved model: saved_models/model_epoch_4_file_1.h5
Creating temporary file /sps/km3net/users/adomi/GNNs/Output/FULL_ARCAv5/bg/test/predictions/temp_pred_model_epoch_4_file_1_on_listARCA_val_file_3.h5_02-02-2021-21-22-58
Predicting in step 0/51 (0.00%)
2021-02-02 22:22:59.618998: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Traceback (most recent call last):
File "/pbs/home/a/adomi/mypython/bin/orcapred", line 11, in <module>
load_entry_point('orcanet', 'console_scripts', 'orcapred')()
File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet_contrib/parser_orcapred.py", line 117, in main
fileno=fileno)
File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet_contrib/parser_orcapred.py", line 82, in orca_pred
pred_filepath_conc = orga.predict(epoch=epoch, fileno=fileno,
File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/core.py", line 343, in predict
self, model, epoch, fileno, samples=samples)
File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/backend.py", line 374, in make_model_prediction
h5_inference(orga, model, files_dict, pred_filepath, samples=samples)
File "/sps/km3net/users/adomi/GNNs/OrcaNet/orcanet/backend.py", line 280, in h5_inference
y_pred = model.predict_on_batch(info_blob["xs"])
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1788, in predict_on_batch
outputs = predict_function(iterator)
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 814, in _call
results = self._stateful_fn(*args, **kwds)
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/pbs/home/a/adomi/mypython/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: input must have at least k columns. Had 10, needed 11
[[{{node functional_1/get_edge_features_disjoint/map/while/body/_1/functional_1/get_edge_features_disjoint/map/while/TopKV2}}]]
[[Func/functional_1/get_edge_features_disjoint_2/map/while/body/_61/input/_143/_202]]
(1) Invalid argument: input must have at least k columns. Had 10, needed 11
[[{{node functional_1/get_edge_features_disjoint/map/while/body/_1/functional_1/get_edge_features_disjoint/map/while/TopKV2}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_predict_function_3376]
Function call stack:
predict_function -> predict_function
Do you maybe understand where the problem is @sreck, @dguderian ? Please let me know if you need further information.