Unidentified error on looping moon=288 (21/32)

Ive made a small change to one of my working submissions which shouldnt have affected any processing. My code runs on my local machine however when Ive tried to submit this solution twice and recieved an unexplained command not exited correctly: 1. The training loop and inference loop both run and Ive tried different intervals of training which all work until it reaches moon=288 (21/32). Is this a bug or is it probable that I have a bug in my code. The error message does not print out any error logs only command not exited correctly: 1

The trace is:

  File "/context/code/main.py", line 659, in train
    all_preds, all_trues = TemporalCV(X_train,y_train)
  File "/context/code/main.py", line 507, in TemporalCV
    features_t = feature_extractor.predict(fold_train_X)
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[664912,8,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:ConcatV2] name: concat

OOM stands for Out of Memory.

1 Like