Issues with AWS

My code runs fine on Google Colab. I submit it to the platform and it starts running it on AWS, but it halts after 15 mins , in the middle of one of the test datasets !!! it freezes , and then eventually times out , after 14 hours , but it literally freezes in 25 mins or less. The code is fine , no error , just a freeze !!!
Also a couple times I got a strange label on my submission “BAD PREDICTION” which says “Different ID(S)” despite the fact that the code ran perfectly fine. Can you please explain what this error means?

Could you tell me the Run ID?

18985 is “BAD PREDICTION” “Different ID(S)”

19248 was one stuck one. I had to dumb down a model to one Batch training only , so that it can run in 6 seconds per data set. Then I submitted it for test run , but it stopped shortly after maybe after 35 data sets and 10 mins (excluding 4-5 minutes to run the code). Not sure. The model is simple and runs fine on Google Colab. I am getting very frustrated. In fact , I just tried to access my submissions and they are also not really available.

You have 9399 IDs that should not be included.

Here is the list:
run_18985_bad_ids.txt (101.0 KB)

It just looks like you didn’t have enough quota.
The last TQDM printout showed that it already ran for 1h07m09s, if you include the run setup + data loading, the 1h20m timeout is fair.

The quota refresh is today. If you wait a few hours, you will have access to 15 hours of compute.

You are right, the output of the run is not accessible because the participant could use it to leak the dataset.


You seem to be printing a lot.
After 1500 lines, the logs are no longer recorded. This is to prevent spam, as some runs would be unreadable because of all the logs.

I am not sure why this is the case that I am getting unwanted IDs ? I have not changed the mechanical part of the code where the IDs are read , graph is created , and then from a graph a series of 1’s and 0’s are written into predictions[f’{dataset_id}{i}{j}'] = int(G.loc[i, j]) ! Given that I haven’t manipulated dataset_id , then why do I have 9399 IDS that shouldn’t be there?

I was printing to debug my code. The submitted code doesn’t have to have printing. regarding leaking the data , isn’t the test data already provided to us? we can read the test data , and I thought it is the data that the scores are calculated on?

Regarding the 15h computation quota , do we get 15 hours per day , or 15 hours is for 3 days or one week or … ? I need more , as I want to test all the existing algorithms in the gcastle package , some algorithms from other packages and also some from paper with codes , and I will also need a lot of computational power to test multiple mixing and blending algorithms. 15hours per week is absolutely not enough. is it 15 hours per day?

Just by looking at 00003_1_0 (the very first one in the list of bad ids), there is no node #0 or #1 in the dataset 00003.
image

The same is true for 00004_1_7:

Of the 47,000 data sets in the competition, only half are provided for the submission phase.
Your code will discover the other half during the out-of-sample phase.
To ensure that your code is ready, the current data is again split 80-20, with the 20% not directly downloadable, but only available in the runner.

No, it is 15 hours per week.

I see why I got those extra IDS. My mistake. So the reason that my code froze sometimes shortly after start was that I was finished with the 15 hours weekly quota? when does this 15 hours start and end?

Yes, you likely didn’t saw that the Run got killed because of the timeout and just continued waiting for it.

Once a Run is stopped, the logs also stop being refreshed. So the lines that are a bit late only appear after a page refresh.

Sorry for the confusion.

No worries. I think you have changed something now in the platform. In the past I used to make a submission and see it being executed, but now, as soon as I submit , it kicks me out , says “Access Denied” . Is it a new feature or something customize for my account?

By the way , I got frozen again , this time after 12 minutes , and running only 15 or so of the test data :


it is the Run# 19354

How do you want to submit?
Through the website? Or via the Crunch CLI?

I cannot find the root cause of the problem. It is probably related to your code, since others are running fine.
Without the ability to see and debug it, there is not much I can do.

I will submit a notebook