Why does my code always hang at time series #994 during inference?

I am running inference on a batch of time series with a trained model. Everything works fine up to time series #993: each one is processed, classified, and a probability is returned.

But when the loop reaches #994 (task nr. 1212), the code hangs indefinitely. Nothing is printed (I tried print statements and even tqdm, but no output appears). It just sits there until ~15 hours later, when the job fails with a time limit error.

What’s strange is that locally I can run inference on local sets (e.g. 10,001 training series or 101 test series) without any issues. The models are already trained, so this is inference only — no training involved.

Could there be something special about time series #994 — maybe some pattern or feature that never appeared in the training or test data — that causes the code to freeze?

Has anyone experienced something similar, or do you have advice on how to debug this specific series?

Multiple thing:

  1. The logs are limited to the first 1000 lines and last 500 lines. So the “hang” is likely your code printing too much. The progress bars are nefarious for that. I suggest you only print every 1000 datasets that you are indeed running.
  2. You need to infer on 10’000 datasets, not just 100 like you did locally (overview was updated recently). This is likely the cause of your timeout.