Ive made a small change to one of my working submissions which shouldnt have affected any processing. My code runs on my local machine however when Ive tried to submit this solution twice and recieved an unexplained command not exited correctly: 1
. The training loop and inference loop both run and Ive tried different intervals of training which all work until it reaches moon=288 (21/32). Is this a bug or is it probable that I have a bug in my code. The error message does not print out any error logs only command not exited correctly: 1
The trace is:
File "/context/code/main.py", line 659, in train
all_preds, all_trues = TemporalCV(X_train,y_train)
File "/context/code/main.py", line 507, in TemporalCV
features_t = feature_extractor.predict(fold_train_X)
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[664912,8,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:ConcatV2] name: concat
OOM stands for Out of Memory.
1 Like